I thought this might be a problem in the Tokenizing process (the original implementation was ASCII-only), but looking at the code more closely there don't appear to be any problem spots.
The only relevant code is:
return (
'a' <= char <= 'z' or
'A' <= char <= 'Z' or
'0' <= char <= '9'
)
I thought this might be a problem in the Tokenizing process (the original implementation was ASCII-only), but looking at the code more closely there don't appear to be any problem spots.
The only relevant code is:
which works in Unicode as well as Ascii.