dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

regex match group index problem #79

Open alingse opened 2 years ago

alingse commented 2 years ago

sorry new issue again, this is about sly re match group

First I use a regex to match LITERAL string like this

import re
LITERAL = r'("[^"]*")' + '|' + r"('[^']*')"
re.compile(LITERAL).search(r'"hello"')               # work
re.compile(LITERAL).search(r'"hello \" ok \""')      # not work

and after search I choose to use this regex expression https://stackoverflow.com/questions/14366401/correctly-parsing-string-literals-with-pythons-re-module

import re
LITERAL = x1 = r'''(\"|\')((?<!\\)\\\1|.)*?\1'''
re.compile(LITERAL).search(r'"hello"')               # work
re.compile(LITERAL).search(r'"hello \" ok \""')      # work

but when used in sly lexer, it raise error error: cannot refer to an open group at position 29

after debug this file https://github.com/dabeaz/sly/blob/master/sly/lex.py#L307 I found that the \1 should be replaced with \2

finally, I found that it is also related with the TOKEN define order (see https://github.com/alingse/thrift-parser/blob/master/simple.py#L25 )

:joy:

In these case, Is this a correct way ? I mean count the expression define order to set the match group index.

btw, forgive my offense, can we mention this in some document?