Parser should identify all syntax errors in an Alf text

seidewitz commented 5 years ago

The Reference Implementation uses the JavaCC parser technology (https://javacc.org/). The default error handling strategy provided by this framework is to throw an exception (org.modeldriven.alf.parser.ParseException) when the first syntax error is found. What should happen instead is for some local error recovery to be done so that parsing can continue. In this way, all the syntax errors in a text can be reported.

The basic approach for implementing error recovery with JavaCC is described at https://javacc.org/tutorials/errorrecovery. However, we need something more sophisticated than just the “skip to the next semicolon” strategy used in their example. Also, the errors should be returned in a collection from the calls to the parser methods (instead of them throwing ParseExceptions, as they do currently), similarly to the way ConstraintViolations are handled by the checkConstraints method from org.modeldriven.alf.syntax.SyntaxElement.

abstratt commented 5 years ago

Ed, in terms of exposing the parsing results via API (say, via Parser interface), is it accurate to assume that we would either end up with a UnitDefinition/Block/Expression, or we would want a list of parsing exceptions obtained during the parsing, but never both (for instance, do we want to also be able to return partially built parsed elements)?

seidewitz commented 5 years ago

There is not an immediate need for a partially built parse tree. However, I think it is likely that it will be necessary to insert partially-built elements into the parse tree in order to continue parsing in the presence of errors. In this case, you may naturally end up with a complete parse tree with partially built or "dummy" elements in it. If so, it is probably worth providing a way to return this. But, if it does not turn out to be straightforward to do in the design, it is not a priority.

abstratt commented 5 years ago

Thanks. If it was a either/or situation, I was going to go with a an exception that contained a list of "Problem" objects (I retrofitted ConstraintViolation to be a subclass of Problem). So you either got the parsed UnitDefinition successfully, or an exception containing the various reasons why it was not possible.

An alternative would be to accept a Problem -> Boolean function, and in that way let the caller process errors and decide whether to stop or not. Something like this:

public interface Parser {
    ...
    /** 
       Sets a problem collector. A problem collector is given a chance to decide whether 
       a problem is a show stopper or not. If a problem collector is configured, parsing will
       only throw an exception if the the problem collector function returns false for a problem
       it is given.
   */
    void setProblemCollector(Function<Problem, Boolean> problemCollector);
}

What do you think? I am not married to the idea of taking a function, we could declare a ProblemCollector interface as well to make its role more explicit.

seidewitz commented 5 years ago

While the idea of some sort of callback mechanism is appealing, I don't think that would be the most useful approach right now. It would be more helpful to get a complete list of problems back after parsing the entire text.

I was thinking of something like passing in a collection into which problems could be added, and then returning the parsed element, which would be considered incomplete if there were any problems. Perhaps there could be two forms of each method, one in which the collection is passed in and one that throws an exception if there are problems (probably implemented by calling the former).

Another possibility would be to have the parser collect the problems in its state, and simply have a "getProblems" method on the parser.

abstratt commented 5 years ago

@seidewitz These are some of the situations I am looking into improving the parser's resilience:

an error in a member aborts the current member but continues on the next one
an error in a tuple expression aborts the current expression but continues on the next one
an error in a statement aborts the statement but continues on the next one
about specific statements:
1. an error in an if clause aborts the current clause but continues on the next one
2. an error in a do/while/for condition does not prevent the do/while body from being parsed, and vice-versa

Do you have other cases in mind?

seidewitz commented 5 years ago

For an if statement, an error in the condition of a clause does not prevent the parsing of the clause body, and vice versa.

For a switch statement:

An error in the switch expression does not prevent the parsing of the switch clauses (and vice versa).
An error in a case expression aborts the expression but continues with the next case expression (if any).
An error in a case expression does not prevent the parsing of the switch clause body, and vice versa.
An error in a switch clause does not prevent the parsing of subsequent clauses.

For an accept statement:

An error in an accept clause does not prevent the parsing of the associated accept block (if any), and vice versa.
An error in an accept clause does not prevent the parsing of subsequent clauses.

In general, for a namespace or classifier definition, an error in the declaration/header does not prevent the parsing of the body of the definition, and vice versa.

It is also probably possible to do more fine-grained error recovery within expressions, but I don't think this needs to be a priority to start with.

seidewitz commented 5 years ago

I was also thinking further about returning a partially built parse tree. On reflection, I think we will need to have this in order to eventually do content assist/quick fix proposals on a text with syntactic errors. So we should implement the error recovery so that it builds as much of the parse tree as possible, even in the presence of syntax errors.

abstratt commented 5 years ago

From: https://github.com/ModelDriven/Alf-Reference-Implementation/pull/84#pullrequestreview-273434027

The method org.modeldriven.alf.fuml.units.ModelNamespaceImpl.resolveModelFile needs to be updated to report parsing errors based on the new mechanism. (However, TokenMgrError exceptions for lexical errors still need to be caught at this point.)
There needs to be an exception handler in UnitDefinitionEOF, too. If the first token of the unit is bad, then this causes an exception in UnitDefinition, which is caught. However, it seems that trying to then consume the next token, to check for the EOF, causes an exception to be thrown in UnitDefinitionEOF, which is not caught. For example, trying to parse activit A() { } using UnitDefinitionEOF results in a ParseException being thrown. (I have not checked to see if the same problem occurs with the other "EOF" productions.)

seidewitz commented 5 years ago

The implementation in v1.1.0h provides basic error recovery. Multiple parsing errors will be identified across statements, but generally not within a single expression or statement (unless it is a compound statement). The satisfies the basic goal for this enhancement of continuing parsing after the first syntax error. Future release will incorporate continued evolution of the parser error recover strategy.

ModelDriven / Alf-Reference-Implementation

Parser should identify all syntax errors in an Alf text #75