Closed Stormur closed 7 months ago
I think that's about the \s
and \S
escapes: they should be \\s
and \\S
unless it's an r-string.
I do not think that Python 3.12 would enforce raw strings for regular expressions. On the other hand, the current line https://github.com/UniversalDependencies/tools/blob/5363b77142778cba2d6cc1a50f74d010331508cf/validate.py#L144 either should use raw strings or should use double backslashes.
There are more lines like this. The problem surfaced on a Mac with Python 3.12, while on my Linux system with Python 3.10 it does not come up.
(As a personal notes, given the choice I find raw strings more readable than escaped characters.)
There is still one left in the current validate.py:
tools/validate.py:684: SyntaxWarning: invalid escape sequence '\p'
edeprelpart_resrc = '[\p{Ll}\p{Lm}\p{Lo}\p{M}]+(_[\p{Ll}\p{Lm}\p{Lo}\p{M}]+)*';
There is still one left in the current validate.py:
tools/validate.py:684: SyntaxWarning: invalid escape sequence '\p' edeprelpart_resrc = '[\p{Ll}\p{Lm}\p{Lo}\p{M}]+(_[\p{Ll}\p{Lm}\p{Lo}\p{M}]+)*';
Thanks! Fixed.
It seems that the new version of Python 3.12 requires that regular expressions are formatted as
r'...'
instead of simple strings'...'
.So, when calling the
validator.py
SyntaxWarnings are issued (e.g. for line 144sentid_re=re.compile('^# sent_id\s*=\s*(\S+)$')
)If this is correct, probably the code needs this small update? It should be backward compatible, right?