Currently spceval's ExprError has an index field for purposes of error reporting.
With the way my parser works, the index is actually a byte index rather than a UTF-8 aware char index. It does work with UTF-8 sequences just fine, but not the problem is with reporting errors when UTF-8 characters are involved.
While reporting error we assume a char index. Namely print_error uses width formatting and just prints the caret to pin-point the error in the expression based on the index.
Making spceval return char index doesn't seem feasible. Both the enumerate() and char_indices() methods return byte indices.
The alternative is for clients to iterate over the expression and use is_char_boundary to figure out if which bytes to skip. So for example if a function exists with a UTF-8 character, say "ävg" where "ä" occupies 2 bytes:
For an invalid expression ävg 5, spceval will return parenthesis missing at 4 (but the error reporting should point at the space character, i.e. index 3).
Expected (point to byte 3):
ävg 5
^
Error: parenthesis missing at 3 for function 'ävg'
What we'll currently do (point to byte 4):
ävg 5
^
Error: parenthesis missing at 4 for function 'ävg'
Currently spceval's
ExprError
has an index field for purposes of error reporting.With the way my parser works, the index is actually a byte index rather than a UTF-8 aware char index. It does work with UTF-8 sequences just fine, but not the problem is with reporting errors when UTF-8 characters are involved.
While reporting error we assume a char index. Namely
print_error
uses width formatting and just prints the caret to pin-point the error in the expression based on the index.Making spceval return char index doesn't seem feasible. Both the
enumerate()
andchar_indices()
methods return byte indices.The alternative is for clients to iterate over the expression and use
is_char_boundary
to figure out if which bytes to skip. So for example if a function exists with a UTF-8 character, say "ävg" where "ä" occupies 2 bytes:For an invalid expression
ävg 5
, spceval will return parenthesis missing at 4 (but the error reporting should point at the space character, i.e. index 3).Expected (point to byte 3):
What we'll currently do (point to byte 4):