akshaymankar / jsonpath-hs

Haskell implementation of JSONPath
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

endParser and lookAhead rationale? #44

Open glyn opened 1 year ago

glyn commented 1 year ago

As of eb6f89c60a85e000bc353f4b8ca7b076d65013dd in PR https://github.com/akshaymankar/jsonpath-hs/pull/39, I am having difficulty making the following test pass:

  test/Spec/FunctionExprSpec.hs:29:9: 
  3) Spec.FunctionExpr, function expression, recovers from match of literal argument
       expected: FunctionExpr "foo" [ArgLogicalExpr (ComparisonExpr (CmpBool True) Equal (CmpBool True))]
       but parsing failed with error:
         1:10:
           |
         1 | foo(true == true)
           |          ^
         unexpected '='
         expecting ')' or white space

  To rerun use: --match "/Spec.FunctionExpr/function expression/recovers from match of literal argument/"

The debug output for this test is:

comparable> IN: "true)"
comparable> MATCH (COK): "true"
comparable> VALUE: CmpBool True
comparable> HINTS: []

comparable> IN: "true)"
comparable> MATCH (COK): "true"
comparable> VALUE: CmpBool True
comparable> HINTS: []

comparable> IN: "true)"
comparable> MATCH (COK): "true"
comparable> VALUE: CmpBool True
comparable> HINTS: []

comparable> IN: "true)"
comparable> MATCH (COK): "true"
comparable> VALUE: CmpBool True
comparable> HINTS: []

basicFilterExpr> IN: "true == true)"
basicFilterExpr> MATCH (EERR): <EMPTY>
basicFilterExpr> ERROR:
basicFilterExpr> offset=16:
basicFilterExpr> unexpected ')'
basicFilterExpr> expecting '!' or end of input

first arg> IN: "true == true)"
first arg> MATCH (COK): "true"
first arg> VALUE: ArgLiteral (CmpBool True)
first arg> HINTS: []

arg list> IN: "true == true)"
arg list> MATCH (COK): "true"
arg list> VALUE: [ArgLiteral (CmpBool True)]
arg list> HINTS: [',']

    recovers from match of literal argument [✘]

So it seems that the comparison true == true fails to parse because it is followed by the closing parenthesis of the function expression of which true == true is the first (and only) argument. I'm trying to wrap my head around this.

The lookAhead endParser line seems to be the cause of this:

comparisionFilterExpr :: Parser a -> Parser FilterExpr
comparisionFilterExpr endParser = do
  expr <-
    ComparisonExpr
      <$> comparable condition
      <*> condition
      <*> dbg "comparable" (comparable endParser)
  _ <- lookAhead endParser
  pure expr

When I comment this line out, the above test passes, but lots of other more basic tests fail.

To unblock me, please could you explain:

(Apologies for raising this discussion topic in the form of an issue, but I am on the verge of giving up on this codebase if I can't get my head around this pattern.)

akshaymankar commented 1 year ago

I introduced it in this commit: https://github.com/akshaymankar/jsonpath-hs/commit/faf9c0a61a9800b0699b46c1a05263e760346e13. The commit message has some details.

It was just to make it produce better errors. I found it especially useful whenever parsing a path inside a path, like in filters or for logical expressions. I think the mistake here is that the function expressions end in a ), but the endParser passed to the functionArgs parser is just the same endParser which was passed to the functionExpr parser. Here it should be char ')'. I changed the parser to be like this and the test works:

functionExpr :: Parser FunctionExpr
functionExpr = do
  FunctionExpr
    <$> functionName
    <*> inParens (dbg "arg list" (functionArgs (char ')')))
glyn commented 1 year ago

Thanks @akshaymankar! I'm sure that's a step in the right direction.

Please could you tell me the point of lookAhead endParser?

akshaymankar commented 1 year ago

After thinking a bit I think I remember why I did this. Maybe there is a better way to achieve what I am trying and I am not sure if I came up with the lookAhead endParser idea or if I took it from somewhere. But here is an example which might help. Consider a couple of parsers like this:

x :: Parser Text
x = do
 _ <- string "foo"
 pure "x"

xorx :: Parser Text
xorx = do
  _ <- x
  _ <- char '|'
  _ <- x
  pure "xorx"

y :: Parser Text
y = (try xorx <|> x) <* eof

Now running the parser on a wrong string like "foot" produces this error:

ghci> parseTest y "foot"
1:4:
  |
1 | foot
  |    ^
unexpected 't'
expecting end of input

Notice that it says expecting end of input, however a | character should also work here. This is because (try xorx <|> x) has already succeeded and the parser is now looking for an eof. One could try to parse the eof inside, but saying that x must end in an eof makes defining xorx in terms of x impossible. However, we could tell x to look for a | character when we're parsing the first x in xorx. Here is an example, but with x and y replaced with a and b:

a :: Parser end -> Parser Text
a endParser = do
 _ <- string "foo"
 _ <- endParser
 pure "a"

aora :: Parser Text
aora = do
  _ <- a (char '|')
  _ <- a eof
  pure "aora"

b :: Parser Text
b = try aora <|> a eof

Running this parser produces a more accurate error message:

ghci> parseTest b "foot"
1:4:
  |
1 | foot
  |    ^
unexpected 't'
expecting '|' or end of input

Here the result of endParser is always ignored, if we don't want it to be ignored, we can just use lookAhead. That said, I think the endParser is always ignored in the code so far, so it is not as important as I thought. There is also a weaker reason to use lookAhead: It allows using combinators like inParens and inSqBr instead of always parsing opening brackets separately from closing brackets.

In the code I removed the lookAhead from comparisionFilterExpr (on c11a840) and here is what I get before removing the lookAhead:

ghci> parseTest filterParser "[?true==5bla]"
1:10:
  |
1 | [?true==5bla]
  |          ^^
unexpected "bl"
expecting "&&", "||", '!', '.', 'E', ']', 'e', digit, or white space

And here is what I get after removing the lookAhead:

ghci> parseTest filterParser "[?true==5bla]"
1:10:
  |
1 | [?true==5bla]
  |          ^
unexpected 'b'
expecting '.', 'E', ']', 'e', or digit

Hope this makes sense. Please feel free to find another way to get this to work.

glyn commented 1 year ago

@akshaymankar That's very helpful. Thank you. I don't immediately have a better solution, but at least I now understand the problem lookAhead endParser is solving.