antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.3k stars 3.3k forks source link

Semantic predicates does not work as expected with left recursion #1968

Open marcospassos opened 7 years ago

marcospassos commented 7 years ago

I have the following grammar:

start: rule EOF;
rule: rule suffix | value;
suffix: {!isParentRule(ContextualRuleContext.class)}? 'bar' contextualRule;
contextualRule: rule;
value: 'foo';

Where isParentRule checks if the current context has a parent context which extends from a given class.

The problem is that an input like foobarfoobarfoo always results in "no viable alternative". I expected that whenever the semantic predicate returns false, the immediate parent suffix would match.

sharwell commented 7 years ago

:memo: The predicate in suffix is a context-free predicate (it doesn't reference $ctx, or any argument, return value, or reference within the rule). It is therefore allowed to execute in any context with the only restriction being that the token stream is an a position where it could theoretically cross the predicate. If you are testing for the current rule element within the predicate then there is no way it can be expected to return a consistent result.

You would need to rewrite the predicate as follows for it to be valid:

{!isParentRule($ctx, ContextualRuleContext.class)}?

However, at this point the evaluation will not be hoisted into another context. Since suffix does not contain any decisions, this predicate will not be part of the prediction process.

marcospassos commented 7 years ago

@sharwell I'm not sure if I understood the problem. In fact, isParentRule looks like this:

class CustomParser extends GeneratedParser {
    // ...
    protected final Boolean isAncestor(Class<? extends ParserRuleContext> type) {
        ParserRuleContext ctx = this._ctx;

        while (ctx.parent != null) {
            ctx = (ParserRuleContext) ctx.parent;

            if (type.isInstance(ctx)) {
                return true;
            }
        }

        return false;
    }
}

It should do the job, right?

marcospassos commented 7 years ago

However, at this point the evaluation will not be hoisted into another context. Since suffix does not contain any decisions, this predicate will not be part of the prediction process.

The following example does not work as well:

start: rule EOF;
rule: rule suffix | value;
suffix:
     {!isParentRule(ContextualRuleContext.class)}? 'bar' contextualRule |
     {!isParentRule(ContextualRuleContext.class)}? 'bar2' contextualRule
contextualRule: rule;
value: 'foo';
sharwell commented 7 years ago

@marcospassos Can you give a more complete example with input that cause something to fail during parsing? I think I'm having a bit of trouble unraveling this in my head so far.

marcospassos commented 7 years ago

Sure! We really need help here.

grammar Test;

options {
    superClass=CustomParser;
}

start: expression EOF;

expression: string modifier | string;

modifier:
     {!isParentRule(ModifierNonStringContext.class)}? modifierString |
     {!isParentRule(ModifierNonNumberContext.class)}? modifierNumber;

modifierNonNumber : expression;
modifierNonString : expression;

modifierString: 'ms' modifierNonNumber;
modifierNumber: 'mn' modifierNonString;

string: 'foo';

WS
    : (' ' | '\r' | '\t')+ -> channel(HIDDEN)
;
package com.test

import com.test.TestLexer;
import com.testTestParser;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.BailErrorStrategy;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;

public class TestFoo {
    @Test
    public void testUnexpectedNoViableAltException() throws Exception {
        String source = "foo ms foo mn foo";
        ANTLRInputStream inputStream = new ANTLRInputStream(source);
        TestLexer lexer = new TestLexer(inputStream);
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokenStream);
        parser.setErrorHandler(new BailErrorStrategy());

        // Throws org.antlr.v4.runtime.NoViableAltException
        parser.start();
    }
}
package com.test

import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.TokenStream;

abstract public class CustomParser extends Parser {
    public CustomParser(TokenStream input) {
        super(input);
    }

    protected final Boolean isParentRule(Class<? extends ParserRuleContext> type) {
        ParserRuleContext ctx = this._ctx;

        while (ctx.parent != null) {
            ctx = (ParserRuleContext) ctx.parent;

            if (type.isInstance(ctx)) {
                return true;
            }
        }

        return false;
    }
}

Is it enought?

amorimjuliana commented 7 years ago

Hi, @sharwell!

I believe you haven't had time to take a look at this issue, but if you have any tip for us, it will be very valuable. We are completely stuck on this and we can not see any workaround for the issue.

Thank you in advance!

sharwell commented 7 years ago

@marcospassos Why not just remove the predicate? Then after the parse is complete, use a listener or visitor to verify that modifier did not appear with an incorrect parent.

marcospassos commented 7 years ago

@sharwell it's not that simple, once we've 50+ modifiers. To accomplish that, I would have to write a complex logic for every modifier (we're generating AST's, so I would have to know whether a node is on the boundary or not). Modifiers can be chained at start and end:

// (("foo" in lower case) in upper case) 
"foo" in lower case in upper case
// ("foo" replacing from offset (2.5 rounded))
"foo" replacing from offset 2.5 rounded

If the predicate had worked as expected, this kind of rules would be simple to implement as previously shown; the problem is that we never considered that it would not work when considering ANTLR for the job 18 months ago :(

sharwell commented 7 years ago

Unfortunately, since ANTLR doesn't create context objects during lookahead and also doesn't have any form of backtracking, it's nearly impossible to implement predicates of that form. My recommendation would be implementing it without predicates, and then based on the results consider using token lookahead or lookbehind to implement the fewest number of predicates which results in correct trees.

You can catch mistakes by modifying your isParentRule method to be static, and pass $ctx as an argument instead of referencing _ctx as a field. During predicate evaluation, $ctx is guaranteed to be correct while there are no guarantees at all on the value of _ctx.

marcospassos commented 7 years ago

@sharwell such a bad news for us :(

I was wondering what would happen if I duplicate the contextual part of the grammar, so I tried it and it surprisingly worked as expected. However, the error reporting gets completely crazy. Why does it work for this case, but not for the previous one? Is it a bug or lack of support?

grammar Test;

options {
    superClass=CustomParser;
}

start: expression EOF;

expression: string modifier | string;

expressionContextual: string modifierContextual | string;

modifier:
     modifierString |
     modifierNumber;

modifierContextual:
     {!isParentRule(ModifierNonStringContext.class)}? modifierString |
     {!isParentRule(ModifierNonNumberContext.class)}? modifierNumber;

modifierNonNumber : expressionContextual;
modifierNonString : expressionContextual;

modifierString: 'ms' modifierNonNumber;
modifierNumber: 'mn' modifierNonString;

string: 'foo';

WS
    : (' ' | '\r' | '\t')+ -> channel(HIDDEN)
;
sharwell commented 7 years ago

Why does it work for this case, but not for the previous one?

There are no guarantees regarding the value of _ctx during the execution of a predicate. It may be right. Or "wrong". Or right with one input and "wrong" with another. It may change behavior with different versions of ANTLR, different runtime settings, or with changes to the grammar. I put "wrong" in quotes because in reality there is no wrong value; if you are depending on it then ANTLR is allowed to evaluate it in a form such that the results are not what you expect.

The only way to get a deterministic context for predicate evaluation is to pass $ctx as an argument (and it must be $ctx with a $ character, not _ctx).

marcospassos commented 7 years ago

Yeah, I just did it. I've modified the method as you suggested. Duplicating that part of the grammar makes it work, but there are collateral effects. My guess is that it gets confused about which rule it is in, so it waits until match the entire sentence before deciding the rule. However, when an error appears, it cannot determine which one is, so the feedback is useless.

parrt commented 7 years ago

Just for clarification, your suffix rule would work if tested as the start rule. When using rule as the start, it fails because it evaluates the predicates from suffix before actually getting to that rule. So if I understand the issue, then Sam is right. We don't update the context during lookahead, just during parsing. From rule, the current context of the parser is rule. In principle the lookahead mechanism could be augmented but I'm not sure I would add a general mechanism at this point. I might be able to help you use an implementation detail of lookahead to make the predicate work for your case.