Closed RickMoynihan closed 9 months ago
ONS lint their JSON with csvlint which also validates CSVW and JSON, so they're unlikely to have invalid JSON metadata, so changing the default here shouldn't pose too many problems.
There is always the argument about being tolerant of 3rd party broken JSON though, however I'd rather require people opt in to any such behaviour behind a command line flag.
Choices seem to be:
-tolerate-broken-json
for enabling a tolerant parse.3 is arguably the most desireable from a user perspective, however ideally not at the expense of having to use two different json parsers.
If we can find a suitable parser that supports both a strict and a lax mode we could switch to that to support 3, otherwise I think it's arguably better to pick 2 and switch to a strict parse. How much broken JSON do we expect to have to handle in the wild? I'm guessing not that much, particularly if we're advocating a tool set that emphasises correctness.
The version of clojure.data.json that we use is now a stricter parser, so it looks like we'll need to go with option 2.
(json/read-str "{\"foo\":\"bar\", \"baz\":\"blah\"}") => {"foo" "bar", "baz" "blah"} (json/read-str "{\"foo\":\"bar\" \"baz\":\"blah\"}") Execution error at clojure.data.json/read-object (json.clj:312). JSON error (missing entry in object)
Closed by this commit: https://github.com/Swirrl/csv2rdf/commit/de1ec6fc761cd97c9031ae91be2b1dd495acf25f
I was surprised to discover that csv2rdf is tolerant of broken json in its metadata files, in particular missing commas etc.
Without wanting to open up the tolerant reader debate I think it would be better if we could also require that JSON always be valid, and raise an error when it's not.
Personally I find JSON a bit awkward to write (it's very easy to accidentally leave superfluous comma or colon delimiters in the json or accidentally omit them), and I worry that others crafting JSON metadata without extra tooling support will create JSON that is invalid and only we can parse.
I think it would be better to require strict parsing to ensure that people are conservative in what they create.
It looks like this is an unexpected behaviour we have inherited from
clojure.data.json/read-str
:I don't think the
data.json
parser supports a strict parsing flag though so we might not be able to support this without changing JSON parser in which case we may wish to drop support for tolerant reading of JSON; or pick up a dependency on another parser that supports strict mode. I'd very much favour not having the extra dependency, but it may depend on whether we expect to handle invalid JSON.Incidentally I checked the JSON spec, and it looks like strictly speaking it's valid for JSON processors to interpret broken syntax; they're just required to also handle valid JSON.