fsprojects / FsLexYacc

Lexer and parser generators for F#
http://fsprojects.github.io/FsLexYacc/
MIT License
209 stars 69 forks source link

%nonassoc not handled correctly #39

Open TorbenMogensen opened 9 years ago

TorbenMogensen commented 9 years ago

%nonassoc is not handled correctly: When an operator (say, NEQ) is declared nonassociative, there should be no shift/reduce conflict on a state containing items

Exp -> Exp . 'NEQ' Exp
Exp -> Exp 'NEQ' Exp .

Instead, BOTH the shift and the reduce actions should be eliminated if the next symbol is NEQ. This will cause (correctly) a parse error on, say, Exp NEQ Exp NEQ Exp. As it is, it parses equivalently to Exp NEQ (Exp NEQ Exp).

A very simple grammar file that shows the problem is

%token ONE NEQ %nonassoc NEQ %start Exp %type Exp

%% Exp : ONE {1} | Exp NEQ Exp {if $1 = $3 then 0 else 1} %%

Peter Sestoft tells me that he corrected this error (and one having to do with reduce/reduce conflicts) in mosmlyac (which is also based on ocamlyacc) about 10 years ago.

kkm000 commented 9 years ago

@dsyme, I do not understand the code enough to fix the problem. I am assigning this and #40 to you -- hope you could have a look.

kkm000 commented 8 years ago

Quoting @mrgleba in #47:

Using a nonnasoc in this way is rather bogus. Generally arithmetics and logics are done in their own rules. IMHO accomodating FsLexyacc to accept this beahviour would be doing a lot of work (and I mean wild changes in the construction of the automaton) just to allow bad beahviour. Unless there is a really compelling exaplce to the contrary I believe this is a clear WontFix.

So I am closing this issue meanwhile; if you think this is not right, please feel free to reopen and comment.

TorbenMogensen commented 8 years ago

I don't agree with @mrgleba. The definition of non-associative operators is exactly that it is illegal to combine two or more of these without explicit parentheses. See https://en.wikipedia.org/wiki/Operator_associativity#Non-associative_operators

So there is nothing bogus about removing both the shift and the reduce action at the conflict. Nor does it require "wild changes in the construction of the automaton". You simply remove both actions after generating the automaton, just as you would remove the shift action in conflicts with left-associative operators and the reduce action in conflicts with right-associative operators.

Besides, as it is implemented now, use of %nonassoc will only resolve precedence conflicts and not associativity conflicts, so it is useful only for prefix or suffix operators and not for infix operators. And for prefix or suffix operators, using %left or %right works equally well. So %nonassoc is (in the current implementation) useless.

mrgleba commented 8 years ago

What I meant is:

%nonassoc OP %token ONE

expr: | ONE | expr OP expr

is bogus: an operator declared explicitely as non-assoc is uassed as an associative operator. The correct (and current behaviour) is to warn and do shift (treat as right-associative)

Accepting this and reporting this only at runtime would require an extra states to track if OP has already been used in expr.

In cases when you want this you do:

%nonassoc OP %token ONE

expr: | ONE

expr1: | expr OP expr

I cannot see how this transformation could be addd easily in the current implementation.

Generally %nonassoc is used for explicit precedence and as a sanity-check to give a warning when you use it as an associative operator. IMHO no more should be expected of it.

TorbenMogensen commented 8 years ago

The example was deliberately reduced to a minimum. I agree that for this particular case, it would be better to rewrite the grammar. But if you have a large number of infix operators, some of which are (left or right) associative and some of which are non-associative, it becomes cumbersome. As I said, the definition of non-associative is clear enough: It does not prohibit using the operator as an infix operator, it just forbids self-combination without explicit parentheses.

Your argument for why you should rewrite the grammar instead of using precedence declarations applies equally well to all other cases of precedence and associativity conflicts: The "safest" thing to do is to explicitly rewrite the grammar. But precedence declarations are much more convenient (and gives smaller and faster parsers). As long as it is clear how conflicts are handled by precedence declarations, I can not see any problem.

TorbenMogensen commented 8 years ago

One thing that could be considered is adding two new declarations: %prefix and %suffix. These declare the operators to be prefix or suffix operators, so they essentially only provide precedence information. If there are any conflicts involving two operators with the same precedence, a warning should be given.

mrgleba commented 8 years ago

I understand it might be nice to be able to use %nonasoc this way but it would be a hard task requiring a lot of work. You'll have to provide a real-world compelling example to warrant even considering this as a feature.

TorbenMogensen commented 8 years ago

Very well. Troll (http://topps.diku.dk/~torbenm/troll.msp) is a language for randomly rolling or calculating probabilities for dice rolls. It has a multitude of infix operators, including ".." that given two numbers give a collection of all numbers in the interval. For example, "2 .. 5" gives the collection "2 3 4 5". It is non-associative because, for example, "2 .. 5 .. 7" does not make sense.

I use, in mosmlyac, the %nonassoc operator to handle this. This makes it possible to use only one nonterminal for expressions. Otherwise, I would need three: One for operators that bind stronger than "..", one for ".." and one for operators and control structures that bind weaker than "..".

You can download the code (in Moscow ML) for Troll including the parser specification at http://www.diku.dk/~torbenm/Troll/

Note that mosmlyac generates the parser without any conflicts. When porting Troll to F#, the same grammar specification generates 195 reduce/reduce conflicts (related to issue #40) and one shift-reduce conflict for the ".." operator.

Also, I can't see why it should require a lot of work to make %nonassoc work as I suggest: You simply remove both actions when there is a shift/reduce conflict involving a non-associative operator. This is only marginally more work than removing one of these when there is a shift/reduce conflict involving a left- or right-associative operator. If you are worried about users not expecting this behaviour, you can issue a warning that these actions are removed.

mrgleba commented 8 years ago

Could you provide .output file from momlyacc? I'm curious how is the state machine generated.

TorbenMogensen commented 8 years ago

Sure.

TrollParser.zip

mrgleba commented 8 years ago

I've done a bit of further digging and it seems you were right. The state machine that is currently being constructed is good enough and there is an easy way to postpone error till actual runtime (the Error action) and it makes more sense to Error rhather than Shift on an equal precedence nonassoc shift/reduce collision. Thanks for the data and showing me wrong on that one :)

I overestimated the amount of changes this would require. Mostly perhaps due to my earlier "failed" experiments with this. Now I think I found the reason for it. Strangely enough there is some code in fsyacc that tries to undo parsing errors by forcing them to be reductions (or accepts). It seems either wrong or it's a hack around something or I'm missing a point. I'll investigate that a bit more.

I'll follow on this issue in https://github.com/fsprojects/FsLexYacc/pull/51

TorbenMogensen commented 8 years ago

Thanks.

The odd behaviour at parsing errors might be a partial attempt to add error recovery (which will allow multiple syntax errors to be reported in one parse). A common strategy for this is to reduce and then skip input to the next symbol that is in the follow set of the reduced-to nonterminal.

mrgleba commented 8 years ago

Makes sense I guess. The question is should this behaviour be preserverd.

Firstly we cannot throw an exception on a wrongly used nonassoc if we try to recuperate. Secodnly if there is something wrong we should yell about it. Hiding the error form the user should't be the thing to do.

I'd say we should avoid doing that. In the https://github.com/fsprojects/FsLexYacc/pull/51 I've disabled this if the --oldprec option is not given.

@kkm000 @dsyme Your thoughts ?

TorbenMogensen commented 8 years ago

Even with error recovery, an error should be reported, but afterwards the parsing can continue. So there is no case of hiding the error. AFAIK, fsyacc does not allow parsing to continue after an error, so something needs to be added to support error recovery. In any case, error recovery should be optional. You may want to revert to an editor/IDE after the first error, and many people do not like the "false errors" that follow a recovery (since recovery is seldom perfect).

As for %nonassoc (as well as %left and %right), a removed action should (at run time) work like it was never there, i.e., give an error. Recovery is a separate issue.