arr-ai / wbnf

ωBNF implementation
Apache License 2.0
7 stars 4 forks source link

Support insignificant whitespace in regexps #8

Closed marcelocantos closed 4 years ago

marcelocantos commented 4 years ago

For literal spaces, ab c could be expressed as ab\_c.

marcelocantos commented 4 years ago

It doesn't work to replace " " with "" then replace "\_" with " ". Unfortunately, it breaks for a\\_b, which should match a\_b, but will instead match a b.

It almost works to only consider \_ after an odd-length chain of backslashes: s/((?:\A|[^\\])(?:\\\\)*)\\_/\1 /. This has the problem that \_\_ won't match the second \_ because [^\\] needs to match on the first _, which has already been consumed. This could be solved in one pass if re2 supported look-behind assertions, (?<=...). Without it, two passes of the preceding substitution should suffice because the first pass eliminates all adjacencies.

marcelocantos commented 4 years ago

Fixed in https://github.com/arr-ai/wbnf/commit/7d14fc173c103643419c5214ef0b92177012e104.