databricks / sjsonnet

Apache License 2.0
267 stars 55 forks source link

assert keyword parsed eagerly #152

Closed zapster closed 1 year ago

zapster commented 2 years ago

Contrary to the reference implementation, sjsonnet seems to parse the assert keyword eagerly:

echo "{ assertion: false }" > test.jsonnet
$ jsonnet test.jsonnet
{
   "assertion": false
}
$ java -jar sjsonnet.jar test.jsonnet
sjsonnet.StaticError: Unknown variable: ion
    at [Id ion].(test.jsonnet:1:9)

That is with the 0.4.3 jar.

szeiger commented 2 years ago

@lihaoyi-databricks What's the right way to fix this with fastparse? I suspect it's the assert parsing in def member which uses a string literal "assert" as a parser. The other two places where assert is recognized should always scan the whole identifier.

lihaoyi-databricks commented 2 years ago

I haven't looked at this in a while, but I would have assumed this would get handled correctly by https://github.com/databricks/sjsonnet/blob/a3516ad5bfd43c5b46bd87d5584b08123344d593/sjsonnet/src/sjsonnet/Parser.scala#L282, which first tries to parse an identifier and then tries to match it against the set of keywords. I would have expected it to parse the entire identifier assertion, then the match statement should see that it doesn't match "assert", and then fall back to treating it as an identifier

We could in theory follow up the assert parser with a (!CharIn("_a-zA-Z0-9") | End) to ensure that it only parses distinct identifiers, but looking at the code I thought that should not be necessary. @szeiger could you try sprinkling some .logs around the parser, e.g. around the body of expr2, around the CharsWhileIn("_a-zA-Z0-9", 0), so we can see what it's doing?

lihaoyi-databricks commented 2 years ago

Oh here is probably the offending line

https://github.com/databricks/sjsonnet/blob/a3516ad5bfd43c5b46bd87d5584b08123344d593/sjsonnet/src/sjsonnet/Parser.scala#L356

You just need to swap field and "assert", and it should do the right thing (you could also do the (!CharIn("_a-zA-Z0-9") | End) I mentioned earlier, but swapping the two parsers should be enough)