antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.9k stars 3.25k forks source link

Get expected rules after error #1763

Open michaelpj opened 7 years ago

michaelpj commented 7 years ago

Currently it's possible to get the expected tokens at the given state when a parse error occurs. This is great, but for the purposes of reporting to the user it would be better to know the expected rules that could follow that state. For example, it's much nicer to say expected: typename than expected one of: 'int', 'string', Upperid, and this is only magnified when you have multiple rules that could follow.

I don't know how possible this is given the way the ATN system is designed, but it would make the errors a lot friendlier.

michaelpj commented 7 years ago

Actually, what I'd really like is the set of productions that might happen next, be those rules or tokens.

parrt commented 7 years ago

I think this is sort of related to "what tokens can happen next". As for rule, each state knows the rule surrounding it so it'd be possible if we know the state. We can ask the parser for current state.

michaelpj commented 7 years ago

Yes, it's quite like the set of expected tokens. I have a hacked up version of this that I'll post when I'm back at work, basically: follow transitions until you hit either something that could consume input (and report a token), or a rule start (and report a rule).

So you don't report following tokens "inside" another rule invocation. When it works it's very nice, but I'm pretty sure it's incorrect wrt tokens following the current rule, etc. So I'd much prefer an implementation from someone who knew what they were doing!

parrt commented 7 years ago

:) I'll look at this when I do "expected tokens".

michaelpj commented 7 years ago

FWIW here's my (probably wrong) attempt, but hopefully it shows what I'm after:

    public Set<String> nextRulesOrTokens(Parser recognizer, ATNState state, Set<ATNState> seen) {
        Set<String> result = new LinkedHashSet<>();

        if (!seen.add(state))
            return result;

        for (Transition t : state.getTransitions()) {
            if (t instanceof RuleTransition) {
                result.add(recognizer.getRuleNames()[((RuleTransition)t).ruleIndex]);
            } else if (t.isEpsilon()) {
                result.addAll(nextRulesOrTokens(recognizer, t.target, seen));
            } else {
                IntervalSet label = t.label();
                if (label != null)
                    result.add(label.toString(recognizer.getVocabulary()));
            }
        }
        return result;
    }
parrt commented 7 years ago

looks pretty close to me. does it work? ;)

michaelpj commented 7 years ago

More or less. But I think I should probably be keeping track of the rule stack so that I don't pass through rule end transitions for rules I haven't entered, etc. LL1Analyzer._LOOK is doing a bunch of stuff that seems tricky!

Also, I'm returning strings, but it could be nice to return a richer data structure. That might let you combine any expected rule/token methods, and maybe get rid of things like Toke.EPSILON.

mike-lischke commented 7 years ago

You probably also need extra handling when one of the following states is in the same rule as the passed in state, since you don't want the current rule be suggested as next possible rule.

michaelpj commented 7 years ago

Mmm, I think I do if it's valid. e.g.

expr : expr '+' expr | lit ;

At the start of an expr rule, I'd like the expected rules to be expr and lit.