BenHanson / parsertl14

C++14 version of parsertl
32 stars 4 forks source link

ParserTL reports no reduce/reduce conflicts but bison/byacc/kmyacc repors 35 #15

Closed mingodad closed 8 months ago

mingodad commented 8 months ago

While converting this grammar https://github.com/diku-dk/futhark/blob/master/src/Language/Futhark/Parser/Parser.y to be used in https://mingodad.github.io/parsertl-playground/playground/ I found that bison/byacc/kmyacc reports 35 reduce/reduce conflicts but parsertl reports then as resolved.

On https://mingodad.github.io/parsertl-playground/playground/ select Futhark parser (partially working) then click Parse to see that it parses and produces a parser tree and reports 35 reduce/reduce conflicts as resolved:

...
Shift/Reduce conflicts resolved 1738.
Reduce/Reduce conflicts resolved 35.

kmyacc:

kmyacc futhark.g.y 
futhark.g.y: there are 35 reduce/reduce conflicts
futhark.g.y: there are 35 reduce/reduce conflicts

Statistics for futhark.g.y:
    111 terminal symbols
    96 nonterminal symbols
    365 productions
    732 states
    0 shift/reduce, 35 reduce/reduce conflicts
    3360 items
    1114 lookahead sets used
    10174+796=10970 action entries
    1066136 bytes used

bison:

bison-nb -v futhark.g.y
futhark.g.y: warning: 35 reduce/reduce conflicts [-Wconflicts-rr]
futhark.g.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
futhark.g.y:354.11-24: warning: rule useless in parser due to conflicts [-Wother]
  354 |     | Atom ".." Exp2
      |           ^~~~~~~~~~~~~~

byacc:

byacc -v futhark.g.y
byacc: 35 reduce/reduce conflicts.

Bellow is the grammar that bison/byacc/kmyacc reports 35 reduce/reduce conflicts:

%token assert charlit def do doc else entry f16lit f32lit f64lit false floatlit for hole
%token i16lit i32lit i64lit i8lit if import in include intlit let local loop match module
%token natlit open stringlit then true type u16lit u32lit u64lit u8lit val while
//%token "#[" ']' '}' ')' "\\" '~' "'" "'~" "'^" '!' '?' '.' '_'

%left bottom
%left ifprec letprec caseprec typeprec sumprec
%left ',' case id constructor '(' '{'
%right ':' ":>"
%right "..." TWO_DOTS_LT TWO_DOTS_GT ".."
%left '`'
%right "->"
%left with
%left '='
%left PipeRight
%right PipeLeft
%left LogOr
%left LogAnd
%left Leq Geq Greater '<' Less Equal NotEqual Bang Equ
%left Band Xor '^' Bor '|'
%left ShiftL ShiftR
%left Plus Minus '-'
%left Times '*' Divide Mod Quot Rem
%left Pow
%left juxtprec
%left '[' INDEXING
%left top

%start Prog

%%

Doc:
      doc
    ;

Prog:
      Doc Doc Dec_ Decs
    | Doc Dec_ Decs
    | Dec_ Decs
    | /*empty*/
    ;

Dec:
      Dec_
    | Doc Dec_
    ;

Decs:
      Decs_
    ;

Decs_:
      /*empty*/
    | Decs_ Dec
    ;

Dec_:
      Val
    | TypeAbbr
    | SigBind
    | ModBind
    | open ModExp
    | import stringlit
    | local Dec
    | "#[" AttrInfo ']' Dec_
    ;

SigExp:
      QualName
    | '{' Specs '}'
    | SigExp with TypeRef
    | '(' SigExp ')'
    | '(' id ':' SigExp ')' "->" SigExp
    | SigExp "->" SigExp
    ;

TypeRef:
      QualName TypeParams '=' TypeExpTerm
    ;

SigBind:
      module type id '=' SigExp
    ;

ModExp:
      ModExp ':' SigExp
    | "\\" ModParam maybeAscription_SimpleSigExp "->" ModExp
    | import stringlit
    | ModExpApply
    | ModExpAtom
    ;

ModExpApply:
      ModExpAtom ModExpAtom %prec juxtprec
    | ModExpApply ModExpAtom %prec juxtprec
    ;

ModExpAtom:
      '(' ModExp ')'
    | QualName
    | '{' Decs '}'
    ;

SimpleSigExp:
      QualName
    | '(' SigExp ')'
    ;

ModBind:
      module id ModParams maybeAscription_SigExp '=' ModExp
    ;

ModParam:
      '(' id ':' SigExp ')'
    ;

ModParams:
      ModParam ModParams
    | /*empty*/
    ;

Liftedness:
      /*empty*/
    | '~'
    | '^'
    ;

Spec:
      val id TypeParams ':' TypeExp
    | val BindingBinOp TypeParams ':' TypeExp
    | TypeAbbr
    | type Liftedness id TypeParams
    | module id ':' SigExp
    | include SigExp
    | Doc Spec
    | "#[" AttrInfo ']' Spec
    ;

Specs:
      Specs_
    ;

Specs_:
      Specs_ Spec
    | /*empty*/
    ;

SizeBinder:
      '[' id ']'
    | INDEXING id ']'
    ;

SizeBinders1:
      SizeBinder SizeBinders1
    | SizeBinder
    ;

TypeTypeParam:
      "'" id
    | "'~" id
    | "'^" id
    ;

TypeParam:
      '[' id ']'
    | INDEXING id ']'
    | TypeTypeParam
    ;

TypeParams:
      TypeParam TypeParams
    | /*empty*/
    ;

LocalFunTypeParams:
      '[' id ']' TypeParams
    | TypeTypeParam TypeParams
    | /*empty*/
    ;

BinOp:
      Plus
    | Minus
    | Times
    | '*'
    | Divide
    | Mod
    | Quot
    | Rem
    | Equal
    | NotEqual
    | Less
    | Leq
    | Greater
    | Geq
    | LogAnd
    | LogOr
    | Pow
    | Xor
    | '^'
    | Band
    | Bor
    | '|'
    | ShiftR
    | ShiftL
    | PipeLeft
    | PipeRight
    | '<'
    | Bang
    | Equ
    | '`' QualName '`'
    ;

BindingBinOp:
      BinOp
    | '-'
    | '!'
    ;

BindingId:
      id
    | '(' BindingBinOp ')'
    ;

Val:
      def BindingId TypeParams FunParams maybeAscription_TypeExp '=' Exp
    | entry BindingId TypeParams FunParams maybeAscription_TypeExp '=' Exp
    | def FunParam BindingBinOp FunParam maybeAscription_TypeExp '=' Exp
    | let BindingId TypeParams FunParams maybeAscription_TypeExp '=' Exp
    | let FunParam BindingBinOp FunParam maybeAscription_TypeExp '=' Exp
    | def '(' Pat ',' Pats1 ')' '=' Exp
    | let '(' Pat ',' Pats1 ')' '=' Exp
    ;

TypeAbbr:
      type Liftedness id TypeParams '=' TypeExp
    ;

TypeExp:
      '(' id ':' TypeExp ')' "->" TypeExp
    | TypeExpTerm "->" TypeExp
    | '?' TypeExpDims '.' TypeExp
    | TypeExpTerm %prec typeprec
    ;

TypeExpDims:
      '[' id ']'
    | '[' id ']' TypeExpDims
    | INDEXING id ']'
    | INDEXING id ']' TypeExpDims
    ;

TypeExpTerm:
      '*' TypeExpTerm
    | TypeExpApply %prec typeprec
    | SumClauses %prec sumprec
    ;

SumClauses:
      SumClauses '|' SumClause %prec sumprec
    | SumClause %prec sumprec
    ;

SumPayload:
      /*empty*/ %prec bottom
    | TypeExpAtom SumPayload
    ;

SumClause:
      Constr SumPayload
    ;

TypeExpApply:
      TypeExpApply TypeArg
    | TypeExpAtom
    ;

TypeExpAtom:
      '(' TypeExp ')'
    | '(' ')'
    | '(' TypeExp ',' TupleTypes ')'
    | '{' '}'
    | '{' FieldTypes1 '}'
    | SizeExp TypeExpTerm
    | QualName
    ;

Constr:
      constructor
    ;

TypeArg:
      SizeExp %prec top
    | TypeExpAtom
    ;

FieldType:
      FieldId ':' TypeExp
    ;

FieldTypes1:
      FieldType
    | FieldType ',' FieldTypes1
    ;

TupleTypes:
      TypeExp
    | TypeExp ',' TupleTypes
    ;

SizeExp:
      '[' Exp ']'
    | '[' ']'
    | INDEXING Exp ']'
    | INDEXING ']'
    ;

FunParam:
      ParamPat
    ;

FunParams1:
      FunParam
    | FunParam FunParams1
    ;

FunParams:
      /*empty*/
    | FunParam FunParams
    ;

QualName:
      id
    | QualName '.' id
    ;

Exp:
      Exp ':' TypeExp
    | Exp ":>" TypeExp
    | Exp2 %prec ':'
    ;

Exp2:
      IfExp
    | LoopExp
    | LetExp %prec letprec
    | MatchExp
    | assert Atom Atom
    | "#[" AttrInfo ']' Exp %prec bottom
    | BinOpExp
    | RangeExp
    | Exp2 ".." Atom
    | Atom ".." Exp2
    | '-' Exp2 %prec juxtprec
    | '!' Exp2 %prec juxtprec
    | Exp2 with '[' DimIndices ']' '=' Exp2
    | Exp2 with INDEXING DimIndices ']' '=' Exp2
    | Exp2 with FieldAccesses_ '=' Exp2
    | "\\" FunParams1 maybeAscription_TypeExpTerm "->" Exp %prec letprec
    | ApplyList
    ;

ApplyList:
      Atom ApplyList %prec juxtprec
    | Atom %prec juxtprec
    ;

Atom:
      PrimLit
    | Constr
    | charlit
    | intlit
    | natlit
    | floatlit
    | stringlit
    | hole
    | '(' Exp ')'
    | '(' Exp ',' Exps1 ')'
    | '(' ')'
    | '[' Exps1 ']'
    | '[' ']'
    | id
    | Atom '.' id
    | Atom '.' natlit
    | Atom '.' '(' Exp ')'
    | Atom INDEXING DimIndices ']'
    | '{' Fields '}'
    | SectionExp
    ;

NumLit:
      i8lit
    | i16lit
    | i32lit
    | i64lit
    | u8lit
    | u16lit
    | u32lit
    | u64lit
    | f16lit
    | f32lit
    | f64lit
    ;

PrimLit:
      true
    | false
    | NumLit
    ;

Exps1:
      Exps1_
    ;

Exps1_:
      Exps1_ ',' Exp
    | Exp
    ;

FieldAccesses:
      '.' FieldId FieldAccesses
    | /*empty*/
    ;

FieldAccesses_:
      FieldId FieldAccesses
    ;

Field:
      FieldId '=' Exp
    | id
    ;

Fields:
      Fields1
    | /*empty*/
    ;

Fields1:
      Field ',' Fields1
    | Field
    ;

LetExp:
      let SizeBinders1 Pat '=' Exp LetBody
    | let Pat '=' Exp LetBody
    | let id LocalFunTypeParams FunParams1 maybeAscription_TypeExp '=' Exp LetBody
    | let id INDEXING DimIndices ']' '=' Exp LetBody
    ;

LetBody:
      in Exp %prec letprec
    | LetExp %prec letprec
    | def
    | type
    | module
    ;

BinOpExp:
      Exp2 Plus Exp2
    | Exp2 Minus Exp2
    | Exp2 '-' Exp2
    | Exp2 Times Exp2
    | Exp2 '*' Exp2
    | Exp2 Divide Exp2
    | Exp2 Mod Exp2
    | Exp2 Quot Exp2
    | Exp2 Rem Exp2
    | Exp2 Pow Exp2
    | Exp2 ShiftR Exp2
    | Exp2 ShiftL Exp2
    | Exp2 Band Exp2
    | Exp2 Bor Exp2
    | Exp2 '|' Exp2
    | Exp2 LogAnd Exp2
    | Exp2 LogOr Exp2
    | Exp2 Xor Exp2
    | Exp2 '^' Exp2
    | Exp2 Equal Exp2
    | Exp2 NotEqual Exp2
    | Exp2 Less Exp2
    | Exp2 Leq Exp2
    | Exp2 Greater Exp2
    | Exp2 Geq Exp2
    | Exp2 PipeRight Exp2
    | Exp2 PipeLeft Exp2
    | Exp2 '<' Exp2
    | Exp2 Bang Exp2
    | Exp2 Equ Exp2
    | Exp2 '`' QualName '`' Exp2
    ;

SectionExp:
      '(' '-' ')'
    | '(' Exp2 '-' ')'
    | '(' BinOp Exp2 ')'
    | '(' Exp2 BinOp ')'
    | '(' BinOp ')'
    | '(' '.' FieldAccesses_ ')'
    | '(' '.' '[' DimIndices ']' ')'
    ;

RangeExp:
      Exp2 "..." Exp2
    | Exp2 TWO_DOTS_LT Exp2
    | Exp2 TWO_DOTS_GT Exp2
    | Exp2 ".." Exp2 "..." Exp2
    | Exp2 ".." Exp2 TWO_DOTS_LT Exp2
    | Exp2 ".." Exp2 TWO_DOTS_GT Exp2
    ;

IfExp:
      if Exp then Exp else Exp %prec ifprec
    ;

LoopExp:
      loop Pat LoopForm do Exp %prec ifprec
    | loop Pat '=' Exp LoopForm do Exp %prec ifprec
    ;

MatchExp:
      match Exp Cases
    ;

Cases:
      Case %prec caseprec
    | Case Cases
    ;

Case:
      case Pat "->" Exp
    ;

Pat:
      "#[" AttrInfo ']' Pat
    | InnerPat ':' TypeExp
    | InnerPat
    | Constr ConstrFields
    ;

ParamPat:
      id
    | '(' BindingBinOp ')'
    | '_'
    | '(' ')'
    | '(' Pat ')'
    | '(' Pat ',' Pats1 ')'
    | '{' CFieldPats '}'
    | PatLiteralNoNeg
    | Constr
    ;

Pats1:
      Pat
    | Pat ',' Pats1
    ;

InnerPat:
      id
    | '(' BindingBinOp ')'
    | '_'
    | '(' ')'
    | '(' Pat ')'
    | '(' Pat ',' Pats1 ')'
    | '{' CFieldPats '}'
    | PatLiteral
    | Constr
    ;

ConstrFields:
      InnerPat
    | ConstrFields InnerPat
    ;

CFieldPat:
      FieldId '=' Pat
    | FieldId ':' TypeExp
    | FieldId
    ;

CFieldPats:
      CFieldPats1
    | /*empty*/
    ;

CFieldPats1:
      CFieldPat ',' CFieldPats1
    | CFieldPat
    ;

PatLiteralNoNeg:
      charlit
    | PrimLit
    | intlit
    | natlit
    | floatlit
    ;

PatLiteral:
      PatLiteralNoNeg
    | '-' NumLit %prec bottom
    | '-' intlit %prec bottom
    | '-' natlit %prec bottom
    | '-' floatlit
    ;

LoopForm:
      for VarId '<' Exp
    | for Pat in Exp
    | while Exp
    ;

DimIndex:
      Exp2
    | Exp2 ':' Exp2
    | Exp2 ':'
    | ':' Exp2
    | ':'
    | Exp2 ':' Exp2 ':' Exp2
    | ':' Exp2 ':' Exp2
    | Exp2 ':' ':' Exp2
    | ':' ':' Exp2
    ;

DimIndices:
      /*empty*/
    | DimIndices1
    ;

DimIndices1:
      DimIndex
    | DimIndex ',' DimIndices1
    ;

VarId:
      id
    ;

FieldId:
      id
    | natlit
    ;

maybeAscription_SimpleSigExp:
      ':' SimpleSigExp
    | /*empty*/
    ;

maybeAscription_SigExp:
      ':' SigExp
    | /*empty*/
    ;

maybeAscription_TypeExp:
      ':' TypeExp
    | /*empty*/
    ;

maybeAscription_TypeExpTerm:
      ':' TypeExpTerm
    | /*empty*/
    ;

AttrAtom:
      id
    | intlit
    | natlit
    ;

AttrInfo:
      AttrAtom
    | id '(' ')'
    | id '(' Attrs ')'
    ;

Attrs:
      AttrInfo
    | AttrInfo ',' Attrs
    ;

%%
mingodad commented 8 months ago

See also https://github.com/haskell/happy/issues/260

BenHanson commented 8 months ago

In the end I had to consult the bison source directly, as the docs don't seem to spell this out.

From AnnotationList.c, function AnnotationList__computeDominantContribution():

/ R/R conflict, so the reduction with the lowest rule number dominates. Fortunately, contributions are sorted by rule number. /

and there is no consulting of precedence in this block.

I'm therefore assuming this is the valid fix.

mingodad commented 8 months ago

Thank you for replying and fixing it ! I've also applied your fixes and I will test with the grammars I have so far to see if there is any non intended side effects.

To start with some grammars now show reduce/reduce conflicts that were hidden before as resolved (sqlite3.g) but it seems that bison and company also report those conflicts so I need to investigate my conversion from lemon to yacc and maybe lemon itself.

BenHanson commented 8 months ago

There is also at least one subtlety. Bison will still generate code whether there are shift/reduce or reduce/reduce conflicts or not.

This mode of operation is available in parsertl if you pass in the address of a warning string to generator::build(), but if not then any conflicts raise an exception.

Be aware that as you are scraping grammars in the wild that it may well be the case that those grammars expect a certain level of conflicts.

Note that in the case of a reduce/reduce conflict then bison will favour the earlier rule in the conflict. This is currently unimplemented in parsertl (but would be easy to add).

mingodad commented 8 months ago

I'm looking through the lemon code and it does use precedence to omit/solve reduce/reduce conflicts:

}else if( apx->type==REDUCE && apy->type==REDUCE ){
    spx = apx->x.rp->precsym;
    spy = apy->x.rp->precsym;
    if( spx==0 || spy==0 || spx->prec<0 ||
    spy->prec<0 || spx->prec==spy->prec ){
      apy->type = RRCONFLICT;
      errcnt++;
    }else if( spx->prec>spy->prec ){
      apy->type = RD_RESOLVED;
    }else if( spx->prec<spy->prec ){
      apx->type = RD_RESOLVED;
    }
  }else{

I'll discuss this in the sqlite forum to see it's author point of view about this, but I do think that the bison way is better because a reduce/reduce conflict is telling that a leaf of the grammar will never be executed.

BenHanson commented 8 months ago

This is all good stuff. Just clarifying what the rules are really helps!

I'm going to look at the Levine book again for the precedence rules when it comes to associativity, as I'm suspicious my handling is incomplete there for shift/reduce errors.

mingodad commented 8 months ago

I posted this https://sqlite.org/forum/forumpost/9cb2fc0e6e11768c but it seems that I could not articulate properly the issue.

BenHanson commented 8 months ago

I posted this https://sqlite.org/forum/forumpost/9cb2fc0e6e11768c but it seems that I could not articulate properly the issue.

The point made by Richard Hipp is fair enough. Remember that the reason for supporting precedence in the first place is indeed to allow smaller (and faster) state machines to be constructed from a grammar.

When it comes to parsertl, I am happy to do the same as Bison.

BenHanson commented 8 months ago

Thinking about it, maybe we could have a flag for lemon mode.

That way you could have a directive that you could set and you wouldn't have to worry.

andreasabel commented 8 months ago

@BenHanson wrote:

I'm therefore assuming this is the valid fix.

Which fix does "this" refer here to?

BenHanson commented 8 months ago

Which fix does "this" refer here to?

Seemingly bison always treats reduce/reduce as an error regardless of what the precedence of the rules/tokens are.

parsertl was previously considering precedence in the reduce/reduce case, which appears to be what the lemon parser generator does also.

I'm wondering whether to default to the bison behaviour, but allow the lemon style behaviour with a flag. This reminds me of https://en.cppreference.com/w/cpp/regex/syntax_option_type BTW.

andreasabel commented 8 months ago

@BenHanson Thanks for the explanation! I am here because of the issue report for happy, the Haskell parser generator.

However, I find the counterexample given by bison unconvincing, so maybe bison does not implement the specs correctly (or happy has an improved implementation using the maximally consistent interpretation of the specs).

(Or, I made a mistake in my analysis.)