Open marcospassos opened 7 years ago
:memo: The predicate in suffix
is a context-free predicate (it doesn't reference $ctx
, or any argument, return value, or reference within the rule). It is therefore allowed to execute in any context with the only restriction being that the token stream is an a position where it could theoretically cross the predicate. If you are testing for the current rule element within the predicate then there is no way it can be expected to return a consistent result.
You would need to rewrite the predicate as follows for it to be valid:
{!isParentRule($ctx, ContextualRuleContext.class)}?
However, at this point the evaluation will not be hoisted into another context. Since suffix
does not contain any decisions, this predicate will not be part of the prediction process.
@sharwell I'm not sure if I understood the problem. In fact, isParentRule
looks like this:
class CustomParser extends GeneratedParser {
// ...
protected final Boolean isAncestor(Class<? extends ParserRuleContext> type) {
ParserRuleContext ctx = this._ctx;
while (ctx.parent != null) {
ctx = (ParserRuleContext) ctx.parent;
if (type.isInstance(ctx)) {
return true;
}
}
return false;
}
}
It should do the job, right?
However, at this point the evaluation will not be hoisted into another context. Since suffix does not contain any decisions, this predicate will not be part of the prediction process.
The following example does not work as well:
start: rule EOF;
rule: rule suffix | value;
suffix:
{!isParentRule(ContextualRuleContext.class)}? 'bar' contextualRule |
{!isParentRule(ContextualRuleContext.class)}? 'bar2' contextualRule
contextualRule: rule;
value: 'foo';
@marcospassos Can you give a more complete example with input that cause something to fail during parsing? I think I'm having a bit of trouble unraveling this in my head so far.
Sure! We really need help here.
grammar Test;
options {
superClass=CustomParser;
}
start: expression EOF;
expression: string modifier | string;
modifier:
{!isParentRule(ModifierNonStringContext.class)}? modifierString |
{!isParentRule(ModifierNonNumberContext.class)}? modifierNumber;
modifierNonNumber : expression;
modifierNonString : expression;
modifierString: 'ms' modifierNonNumber;
modifierNumber: 'mn' modifierNonString;
string: 'foo';
WS
: (' ' | '\r' | '\t')+ -> channel(HIDDEN)
;
package com.test
import com.test.TestLexer;
import com.testTestParser;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.BailErrorStrategy;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;
public class TestFoo {
@Test
public void testUnexpectedNoViableAltException() throws Exception {
String source = "foo ms foo mn foo";
ANTLRInputStream inputStream = new ANTLRInputStream(source);
TestLexer lexer = new TestLexer(inputStream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokenStream);
parser.setErrorHandler(new BailErrorStrategy());
// Throws org.antlr.v4.runtime.NoViableAltException
parser.start();
}
}
package com.test
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.TokenStream;
abstract public class CustomParser extends Parser {
public CustomParser(TokenStream input) {
super(input);
}
protected final Boolean isParentRule(Class<? extends ParserRuleContext> type) {
ParserRuleContext ctx = this._ctx;
while (ctx.parent != null) {
ctx = (ParserRuleContext) ctx.parent;
if (type.isInstance(ctx)) {
return true;
}
}
return false;
}
}
Is it enought?
Hi, @sharwell!
I believe you haven't had time to take a look at this issue, but if you have any tip for us, it will be very valuable. We are completely stuck on this and we can not see any workaround for the issue.
Thank you in advance!
@marcospassos Why not just remove the predicate? Then after the parse is complete, use a listener or visitor to verify that modifier
did not appear with an incorrect parent.
@sharwell it's not that simple, once we've 50+ modifiers. To accomplish that, I would have to write a complex logic for every modifier (we're generating AST's, so I would have to know whether a node is on the boundary or not). Modifiers can be chained at start and end:
// (("foo" in lower case) in upper case)
"foo" in lower case in upper case
// ("foo" replacing from offset (2.5 rounded))
"foo" replacing from offset 2.5 rounded
If the predicate had worked as expected, this kind of rules would be simple to implement as previously shown; the problem is that we never considered that it would not work when considering ANTLR for the job 18 months ago :(
Unfortunately, since ANTLR doesn't create context objects during lookahead and also doesn't have any form of backtracking, it's nearly impossible to implement predicates of that form. My recommendation would be implementing it without predicates, and then based on the results consider using token lookahead or lookbehind to implement the fewest number of predicates which results in correct trees.
You can catch mistakes by modifying your isParentRule
method to be static, and pass $ctx
as an argument instead of referencing _ctx
as a field. During predicate evaluation, $ctx
is guaranteed to be correct while there are no guarantees at all on the value of _ctx
.
@sharwell such a bad news for us :(
I was wondering what would happen if I duplicate the contextual part of the grammar, so I tried it and it surprisingly worked as expected. However, the error reporting gets completely crazy. Why does it work for this case, but not for the previous one? Is it a bug or lack of support?
grammar Test;
options {
superClass=CustomParser;
}
start: expression EOF;
expression: string modifier | string;
expressionContextual: string modifierContextual | string;
modifier:
modifierString |
modifierNumber;
modifierContextual:
{!isParentRule(ModifierNonStringContext.class)}? modifierString |
{!isParentRule(ModifierNonNumberContext.class)}? modifierNumber;
modifierNonNumber : expressionContextual;
modifierNonString : expressionContextual;
modifierString: 'ms' modifierNonNumber;
modifierNumber: 'mn' modifierNonString;
string: 'foo';
WS
: (' ' | '\r' | '\t')+ -> channel(HIDDEN)
;
Why does it work for this case, but not for the previous one?
There are no guarantees regarding the value of _ctx
during the execution of a predicate. It may be right. Or "wrong". Or right with one input and "wrong" with another. It may change behavior with different versions of ANTLR, different runtime settings, or with changes to the grammar. I put "wrong" in quotes because in reality there is no wrong value; if you are depending on it then ANTLR is allowed to evaluate it in a form such that the results are not what you expect.
The only way to get a deterministic context for predicate evaluation is to pass $ctx
as an argument (and it must be $ctx
with a $
character, not _ctx
).
Yeah, I just did it. I've modified the method as you suggested. Duplicating that part of the grammar makes it work, but there are collateral effects. My guess is that it gets confused about which rule it is in, so it waits until match the entire sentence before deciding the rule. However, when an error appears, it cannot determine which one is, so the feedback is useless.
Just for clarification, your suffix rule would work if tested as the start rule. When using rule
as the start, it fails because it evaluates the predicates from suffix
before actually getting to that rule. So if I understand the issue, then Sam is right. We don't update the context during lookahead, just during parsing. From rule
, the current context of the parser is rule
. In principle the lookahead mechanism could be augmented but I'm not sure I would add a general mechanism at this point. I might be able to help you use an implementation detail of lookahead to make the predicate work for your case.
I have the following grammar:
Where
isParentRule
checks if the current context has a parent context which extends from a given class.The problem is that an input like
foobarfoobarfoo
always results in "no viable alternative". I expected that whenever the semantic predicate returns false, the immediate parent suffix would match.