Teknomancer / sysprocalc

Command-line expression-evaluator and x86 register descriptions
Apache License 2.0
0 stars 1 forks source link

spceval returns byte indices but main executable assumes char indices while reporting error #1

Closed Teknomancer closed 4 years ago

Teknomancer commented 4 years ago

Currently spceval's ExprError has an index field for purposes of error reporting.

With the way my parser works, the index is actually a byte index rather than a UTF-8 aware char index. It does work with UTF-8 sequences just fine, but not the problem is with reporting errors when UTF-8 characters are involved.

While reporting error we assume a char index. Namely print_error uses width formatting and just prints the caret to pin-point the error in the expression based on the index.

Making spceval return char index doesn't seem feasible. Both the enumerate() and char_indices() methods return byte indices.

The alternative is for clients to iterate over the expression and use is_char_boundary to figure out if which bytes to skip. So for example if a function exists with a UTF-8 character, say "ävg" where "ä" occupies 2 bytes:

For an invalid expression ävg 5, spceval will return parenthesis missing at 4 (but the error reporting should point at the space character, i.e. index 3).

Expected (point to byte 3):

ävg 5
   ^
 Error: parenthesis missing at 3 for function 'ävg'

What we'll currently do (point to byte 4):

ävg 5
    ^
 Error: parenthesis missing at 4 for function 'ävg'