DmitrySoshnikov / syntax

Syntactic analysis toolkit, language-agnostic parser generator.
MIT License
608 stars 86 forks source link

Python generator doesn't use raw strings #129

Open gankoji opened 1 year ago

gankoji commented 1 year ago

Per Python's regular expressions documentation here, if we want Python re's to have the same escaping rules as other langs we need to prepend the strings with 'r' for raw. Otherwise, python will escape characters for us during interpretation.

When I generate a parser with the following lex section:

%lex

%%

\s+             return

\"[^\"]*\"      return 'STRING'

\d+\.\d+        return 'FLOAT'

\d+             return 'INT'

[\w\-+*=<>/]+   return 'SYMBOL'
/lex

I get the following generated python:

_lex_rules = [['^\(', _lex_rule1],
['^\)', _lex_rule2],
['^\s+', _lex_rule3],
['^"[^\"]*"', _lex_rule4],
['^\d+\.\d+', _lex_rule5],
['^\d+', _lex_rule6],                      
['^[\w\-+*=<>/]+', _lex_rule7]] 

These will fail for 'invalid search sequence' or similar, whereas properly prepending the regexes with 'r' manually solves the problem:

_lex_rules = [[r'^\(', _lex_rule1],
[r'^\)', _lex_rule2],
[r'^\s+', _lex_rule3],
[r'^"[^\"]*"', _lex_rule4],
[r'^\d+\.\d+', _lex_rule5],
[r'^\d+', _lex_rule6],                      
[r'^[\w\-+*=<>/]+', _lex_rule7]] 

The python generator (likely) needs a simple update to prepend 'r' to regex strings.

DmitrySoshnikov commented 1 year ago

@gankoji thanks for the report. Yes, we need to fix this - will appreciate a PR for it.

gankoji commented 1 year ago

@DmitrySoshnikov apologies if there are multiple pings, I don't know if you get notified for the PR. PR #135 is up for this issue.