adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

Current JSON_FRAGMENT lexer rule in ANTLR is too greedy #113

Closed Eswcvlad closed 6 months ago

Eswcvlad commented 7 months ago

Noticed an issue, while trying to pass the merge(`{"a": 1}`, `{"b": 2}`) test with my implementation, based on the JsonFormula.g4 grammar. Instead of getting two argument tokens, I've been getting the whole `{"a": 1}`, `{"b": 2}` string as a JSON_FRAGMENT token.

I am assuming, that it just lexed it like this:

  1. ~[\\`]+: {"a
  2. STRING: ": 1}`, `{"
  3. ~[\\`]+: b": 2}

The rule from the specification (JSON_LITERAL: '`' ('\\`' | ~ [`])* '`' ;) works as expected. IIRC the same rule was in JsonFormula.g4, but was changed later.

JohnBrinkman commented 7 months ago

You are correct, the JSON_FRAGMENT construction is too greedy. I will look for a fix. I'm not sure that reverting to

'`' ('\\`' | ~ [`])* '`'

is correct -- need to check on the handling of unicode escape sequences...

JohnBrinkman commented 7 months ago

I have a fix available: https://github.com/adobe/json-formula/pull/115

Will re-generate the spec once we get through the remaining issues.

Eswcvlad commented 7 months ago

Hmm, I remember trying the '`' (STRING | ~ [\\`"]+)* '`' variant myself, but it didn't seem ideal for error reporting, as something like `{"a": "1` is now caught during parsing/lexing in ANTLR, comparsed to something like JSON.parse.

Eswcvlad commented 7 months ago

Also, just tested now, with this rule also the foo[?bar==`["foo`bar"]`] test from syntax.json doesn't throw a SyntaxError in my case anymore. Guess the issue is similar:

  1. ~ [\\`"]+: [
  2. STRING: "foo`bar"
  3. ~ [\\`"]+: ]
JohnBrinkman commented 6 months ago

Interesting. Our test that flagged valid antlr grammar was checking that when the implementation encountered a syntax error, the grammar did as well -- but we didn't check vice-versa.

I've updated the grammar again -- should make things more consistent.