antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.18k stars 3.71k forks source link

Python: arguments value have incorrect position #2804

Open juli1 opened 2 years ago

juli1 commented 2 years ago

I am parsing the following code

import requests
r = requests.get(w, verify=False)

When I parse the list of argument of the Expr requests.get, the position of the argument is committed.

In particular, for this expression, the trailerContext contains the argument and

trailerContext.arguments().arglist().argument().get(1).test(1).start.getCharPositionInLine()

is equal to

trailerContext.arguments().arglist().argument().get(1).test(1).stop.getCharPositionInLine()

In other words, the start and stop positions are the same.

In other words, the position of the token False is invalid.

I am using the Java version of the grammar. I tried to see if this comes from the Java code or the grammar and I was unable to really see what was going on and troubleshoot further.

kaby76 commented 2 years ago

The code is completely correct. You are misunderstanding the data structure--but I agree with you, it's superbly confusing. I only know because I had to implement parse tree editing, where token and char streams are consistent with the parse tree edits. start and stop are tokens in the token stream. [getCharPositionInLine](https://www.antlr.org/api/Java/org/antlr/v4/runtime/Token.html#getCharPositionInLine()) is the column number for the first character of the token. For leaf nodes in the parse tree, "start" and "stop" will always refer to exactly the same token. For internal nodes, "start" and "stop" may refer to different tokens because that node may span multiple tokens, and so different column number for the first character of each token. But getCharPostionInLine() is always the column for the first character of the token.