Loki-Astari / JsonBenchmark

Json benchmark based on nativejson-benchmark
Other
25 stars 6 forks source link

Add UTF-8 conformance tests #3

Open nlohmann opened 4 years ago

nlohmann commented 4 years ago

When implementing nlohmann/json, I struggled a lot with the UTF-8 validation. I'm now pretty sure that I will only accept correctly encoded inputs and reject anything else. It would be great to add such tests to the conformance tests, because it makes a difference whether a parser just copies everything between quotes or actually checks whether the strings make sense. This also includes decoding of \uxxxx escapes.

Loki-Astari commented 4 years ago

There is a specific test suite making sure \uxxxx is decoded correctly.

See: https://github.com/Loki-Astari/JsonBenchmark/tree/master/data/validate_string

Each test file has the format:

{Json Array Containing one String}<{whitespace}>{UTF-8 character string expected result}<

There is a separate set of tests the should fail parsing:

See: https://github.com/Loki-Astari/JsonBenchmark/tree/master/data/jsonchecker_fail

Each test here is a JSON object.
This can be an invalid string that should not parse correctly.

nlohmann commented 4 years ago

I saw these tests - I think they are still rather cursory. I was more thinking about tests like those in http://seriot.ch/parsing_json.php (see Sect. 2.5).

Loki-Astari commented 4 years ago

This links to this github repository with 300 tests: https://github.com/nst/JSONTestSuite

If anybody wants to add a way to run these test automatically that would be great.