IUCompilerCourse / Essentials-of-Compilation

A book about compiling Racket and Python to x86-64 assembly
1.27k stars 137 forks source link

`%ignore WS` in p.35 of the Python version makes the grammar ambiguous #153

Closed Ray-Eldath closed 1 year ago

Ray-Eldath commented 1 year ago

In p.35 of the Python version of EoC, it is said that to set the parser ignoring all white spaces, one should use the %ignore directive:

WS: /[ \t\f\r\n]/+
%ignore WS

note that \r\n has been counted as WS and be ignored as well. When set ambiguity='explicit', this grammar is ambiguous w.r.t. the program:

10
-10
_ambig
  module
    exp_stmt
      sub
        int 10
        int 10
  module
    exp_stmt
      int   10
    exp_stmt
      int   -10

This is because according to this issue of lark-parser, when a terminal get %ignore-ed, it never reaches the parser. The definition of WS included EOL as well, but the grammar actually needs EOL to demarcate lines, i.e., $L_{var}$ is an EOL-sensitive language while %ignore WS ignores EOLs.

To fix this ambiguity, one should define WS as "white spaces except EOLs":

%import common.WS_INLINE
%ignore WS_INLINE

This makes both Earley with ambiguity='explicit' and LALR happy. I suggest maybe the book needs to be fixed as well provided such "trap" is not set deliberately.

jsiek commented 1 year ago

I'm having trouble reproducing this problem. I wonder if Lark has changed... not sure what version I'm using... what version are you using?

jsiek commented 1 year ago

I'm going to go ahead with your suggested change regardless of being able to reproduce the problem.

jsiek commented 1 year ago

Thanks Ray!