Open nikhilvaishnavi opened 5 years ago
Appreciate any leads on this from the community or antlr4 team.
This rule looks suspect to me:
STRING : DOUBLE_QUOTE ( '\"' | . )+? DOUBLE_QUOTE //"example@com","tony shark", "foo \"1\" bar"
Periods are usually used to match anything in a regex and so that's where it might be picking up the garbage characters. When I want everything except a set of characters I use an inverted character class such as:
[^"]+
Your start
rule does not end with an explicit EOF. Therefore, ANTLR is allowed to stop parsing before the end of the token stream in an attempt to successfully match a subset of the input. There is a good chance that all of your example negative cases involve trailing tokens that can be dropped to make a successful match, but they weren't being caught due to #118.
@sharwell - thanks for your inputs.
though adding start : appfilter EOF ;
leads to the all matched values to be null.
Any tips to do it correctly.
@nikhilvaishnavi I'm not sure why that is occurring. Is the affected project open source where I can see the complete example?
@sharwell I did give an almost complete example which I had shared in the example above which recreates the issue. My project is not open source.
We are converting an input text coming to our REST API in query parameters to a java class - Filter.
if I give a good source data which should match :
key eq "value"
--> this is leading to the Parser matching to null
ParseTree tree = parser.start();
While in case of bad source data
key eq "value") "value2"
--> leads to error after adding EOF;
as suggested by you.
Earlier without EOF, the good input data worked fine but the bad data also didn't produce any error, which is basically negative scenario. We want to catch such bad input and fail.
This worked for 4.6 release but started failing when we tried to move to 4.7+
.. leading to the Parser matching to null ...
I do not understand what this particular statement means. Can you provide an example to help explain it?
So, we have written a Deserializer code which converts this text - key op value
to a java class : Filter
using ANTLR 4.7.2
When we have the below in our ANTLR script :
start : appfilter ;
the following code (with good filter) :
Filter filter = FilterTextDeserializer.readFilter ("keyA eq \"value\"");
return a filter with following output (toString) :
FilterImpl{propertyName='keyA', op=eq, propertyVal='value', logicalOp=null, left=null, right=null}
While with a bad filter (notice the right parentheses is missing) :
`Filter filter = FilterTextDeserializer.readFilter ("(keyA eq \"value\"");`
this fails with expected error :
Invalid filter, Syntax error at position 16
But if the left parenthesis is missing, no such error is given. This behavior started in 4.7.2 and was working in 4.6
Filter filter = FilterTextDeserializer.readFilter ("keyA eq \"value\")");
output :
FilterImpl{propertyName='keyA', op=eq, propertyVal='value', logicalOp=null, left=null, right=null}
This is where I changed the script to add EOF
as suggested by you -
start : appfilter EOF;
And now both the bad filters above are captured correctly and failing but the good filter condition returns a null value back -
`Filter filter = FilterTextDeserializer.readFilter ("keyA eq \"value\"");
return -
filter.toString() --> "null"
We are using --no-listener
& -visitor
as options while generating the antlr code from the script. Let me know if more details are required and I can go over this on zoom if required. Not sure sending the whole code is a feasible option here.
The next step would be seeing the implementations of:
FilterTextDeserializer.readFilter
Filter.toString()
@sharwell please the below - Apologize for the delay.
public static Filter readFilter(String filterStr) {
if (filterStr == null || filterStr.isEmpty()) {
return null;
}
CharStream inputStream = CharStreams.fromString(filterStr);
FilterLexer lexer = new FilterLexer(inputStream);
injectErrorHandler(lexer);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
FilterParser parser = new FilterParser(tokenStream);
injectErrorHandler(parser);
ParseTree tree = parser.start();
FilterBaseVisitorImpl filterBaseVisitor = new FilterBaseVisitorImpl();
return filterBaseVisitor.visit(tree);
}
public String toString() {
return "FilterImpl{" +
"propertyName='" + propertyName + '\'' +
", op=" + op +
", propertyVal='" + propertyVal + '\'' +
", logicalOp=" + logicalOp +
", left=" + left +
", right=" + right +
'}';
}
I'm fairly certain this is the result of FilterBaseVisitorImpl
not providing an implementation of visitStart
. The default implementation returns the result of visiting the last child of a rule. Prior to the change, the last child was an AppFilterSyntax
, but now it's a token. See visitChildren
and aggregateResult
for details:
You are right, we do not have an implementation for visitStart
, so you are suggesting to override this ?
Can you suggest how different this overridden code should be as we don't have any application specific logic at this point and this should only need to visit all available children nodes.
It should just return the result of visiting appFilter
.
Will work on this and get back to you if more support is required. Thanks for your time/tips
While upgrading from ANTLR 4.6 to 4.7.2, we have noticed changed in parser behavior and some text which earlier didn't used to be matched gets matched with no change in grammar.
I am attaching a simplified version of grammar which parses "Filter" object for our application used in our queries for searches. The usual format of this filter could be - key operator value, with or without brackets.
After upgrading the 4.7.2, we noticed quite a few discrepancies - the following negative cases are working and accepted in the grammar, while they all failed in 4.6 with no change in grammar or generated visitor implementation.
key eq "value")
- notice only right parenthesis is given here.key eq "value") anotherValue
title eq "Emplyee" 1234 true
key1 eq "val1"key2 eq "val2"
keyA eq "value" and keyB pr keyC eq "valc"
Note - we used generated visitor implementation and not listener. On debugging, noticed the exception used to come from ANTLR parser but now it just ignores the bad part of the text. So. e.g.,
key eq "value")
gets converted tokey eq "value"
We use Java for this implementation.Below is the Filter grammar file. Filter.g4.txt