erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

want to skip escaped newlines in input #242

Closed pjlbyrne closed 6 months ago

pjlbyrne commented 6 months ago

I am trying to write a grammar to parse makefiles. I am having a problem working out how to parse escaped line continuations. For example:

"X = 1\ 0"

should reduce to the line "X = 10" (the newline being escaped).

Here is my grammar: rules = r""" line = assignment assignment = lval wspace assign_op wspace rval tail tail = wspace comment? lval = identifier rval = value identifier = ~"[A-Z]+[A-Z0-9]"i value = anum+ assign_op = "=" / "+=" wspace = (" " / "\\n")+ comment = "#" char* anum = ("\\n" _anum) / _anum char = ("\\n" _char) / _char _anum = ~"[A-X0-9]"i _char = ~"[A-X0-9 ]"i """

On the above input string this generates: <Node called "assignment" matching "X1 = 1\ 0"> <RegexNode called "identifier" matching "X1"> <Node called "wspace" matching " "> <Node matching " "> <Node matching " "> <Node called "assign_op" matching "="> <Node matching "="> <Node called "wspace" matching " "> <Node matching " "> <Node matching " "> <Node called "value" matching "1\ 0"> <Node called "anum" matching "1"> <RegexNode called "_anum" matching "1"> <Node called "anum" matching "\ 0"> <Node matching "\ 0"> <Node matching "\ "> <RegexNode called "_anum" matching "0"> <Node called "tail" matching ""> <Node matching ""> <Node matching "">

I want the "value" node to be "10" and not "1\n0". Is there some way to discard the "\\n" inside the grammar?

I could preprocess the input file to remove the esacaped returns but this seems like a weak solution.

Thanks