Explicitly resolving if/if/else ambiguity in a grammar

antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

http://antlr.org

BSD 3-Clause "New" or "Revised" License

17.17k stars 3.29k forks source link

Explicitly resolving if/if/else ambiguity in a grammar #42

Open sharwell opened 12 years ago

sharwell commented 12 years ago

I'm working with an ANTLR 4 grammar to correct any construct which results in a call to reportAmbiguity. In many cases this is done by refactoring or combining rules, but the case for an if/if/else construct is proving more challenging.

Starting with the following grammar:

grammar T;
stmt : ifStmt | ID;
ifStmt : 'if' ID stmt ('else' stmt)?;

ID : [a-zA-Z]+;
WS : ' '+ -> skip;

Given the input if x if x a else b, I'm looking for a way to modify the grammar such that no call to reportAmbiguity is needed, while following the standard if/else rule of binding to the closest unmatched if statement.

In ANTLR 3, a syntactic predicate allowed the user to explicitly handle this situation. In ANTLR 4, the correct decision is made through min-alt resolution, but it always wants to emit a "warning" regarding the situation.

sharwell commented 12 years ago

One solution that "works" is rewriting the ifStmt rule to:

ifStmt : 'if' ID stmt ('else' stmt | {_input.LA(1) != ELSE}?);
ELSE : 'else';

Unfortunately this solution has substantial drawbacks:

The predicate is always evaluated, which means it could have unintended consequences if the keyword else is used in allowed in other contexts. To work properly, the predicate would need to only be used for disambiguating alternatives 1 and 2.
The grammar becomes tied to a particular target language.

parrt commented 11 years ago

need to add this to execATN and execDFA to make it work. is it worth complexity?

// IF PREDS, MIGHT RESOLVE TO SINGLE ALT => SLL (or syntax error)
if ( D.configs.hasSemanticContext ) {
    int nalts = decState.getNumberOfTransitions();
    // Update DFA so reach becomes accept state with (predicate,alt)
    // pairs if preds found for conflicting alts
    BitSet altsToCollectPredsFrom = getConflictingAltsOrUniqueAlt(D.configs);
    SemanticContext[] altToPred = getPredsForAmbigAlts(altsToCollectPredsFrom, D.configs, nalts);
    if ( altToPred!=null ) {
        List<DFAState.PredPrediction> predicates =
            getPredicatePredictions(altsToCollectPredsFrom, altToPred);
        input.seek(startIndex);
        BitSet alts = evalSemanticContext(predicates, outerContext, true);
        if ( alts.cardinality()==1 ) {
            System.out.println("Full LL avoided");
            return alts.nextSetBit(0);
        }
    }
}

and

// IF PREDS, MIGHT RESOLVE TO SINGLE ALT => SLL (or syntax error)
if ( s.predicates!=null ) {
    System.out.println("DFA state has preds in DFA sim LL failover");
    input.seek(startIndex);
    BitSet alts = evalSemanticContext(s.predicates, outerContext, true);
    if ( alts.cardinality()==1 ) {
        System.out.println("Full LL avoided");
        return alts.nextSetBit(0);
    }
}

only 2 failovers to full avoided in execATN for entire parse of JavaLRParser.java, none in execDFA

sharwell commented 11 years ago

I was able to simplify the change through a basic refactoring (extract method), and submitted pull request parrt/antlr4#126 to full address this.

sharwell commented 11 years ago

Pull request #124 introduces an sll block option which is a much cleaner solution to this issue. Rather than use a predicate, the following rule may be used:

ifStmt : 'if' ID stmt (options{sll=true;} : 'else' stmt | );
ELSE : 'else';

gkudva commented 7 years ago

Hi, I have a grammar with if/elseif/else statements (C# code generation), and I've run into an issue whereby a deeply nested if/else block (24 levels deep) that parses successfully in ANTLR 4.0 (albeit very slowly), now hangs on ANTLR 4.6.1 - it ran for over 2 hours without completing before I killed it. While trying to debug this issue, I ran across this thread and tried the sll = true option. However, the parsers generated with and without the sll=true option are identical. Is this option supported for C# code generation?

ericvergnaud commented 7 years ago

Hi, The place for support is the google discussion group Eric Envoyé de mon iPhone

Le 30 juin 2017 à 23:34, gkudva notifications@github.com a écrit :

Hi, I have a grammar with if/elseif/else statements (C# code generation), and I've run into an issue whereby a deeply nested if/else block (24 levels deep) that parses successfully in ANTLR 4.0 (albeit very slowly), now hangs on ANTLR 4.6.1 - it ran for over 2 hours without completing before I killed it. While trying to debug this issue, I ran across this thread and tried the sll = true option. However, the parsers generated with and without the sll=true option are identical. Is this option supported for C# code generation?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

parrt commented 7 years ago

try 4.7?

gkudva commented 7 years ago

OK, will post to google groups - thanks.

@parrt - Latest NuGet package for code-generator is 4.6.1 so figured I'd use 4.6.1 for runtime as well to be in sync. Also ANTLR Runtime 4.6.1 package was later dated 5/30/2017 than ANTLR Standard Runtime 4.7.1 published on 5/28/2017. Will try 4.7.1 and see if it works.

sharwell commented 7 years ago

@gkudva The NuGet package (4.6.1) you referred to is part of a different repository. For issues with that (bugs or anything else) you should file an issue on GitHub here: https://github.com/tunnelvisionlabs/antlr4cs/