Closed andylizi closed 1 year ago
Heh, certainly a big overhaul. I need to look at the details more closely when I have more time.
I've copied your test cases to #8 and fixed the resulting panics with a much smaller diff. I hope you understand why I prefer that approach over a big rewrite of the algorithm itself.
I'm not opposed to changes to the algorithm if there is a clear need for it, but in this case I don't think big changes are needed to fix UTF-8 bugs.
I'd still like to add support for multibyte UTF-8 sequences in variable names, but I think that can be a separate change.
I made sure to preserve you as author for the copied test cases and the #[inline]
attributes.
Do you think I missed something important?
Totally understandable! I left the comments on the other PR.
This PR is partially based on #8. When I was testing the UTF-8 fix introduced in #8, I accidently found two entirely new ways for the code to slice inside char boundaries and panic. Feeling the current approach was too brittle, I wanted to try my hand at addressing the issue in a more fundamental way, and this is the result. I know it's a big change and you might prefer something less, er, overhaul-ish :sweat_smile:.
Unfortunately I forgot how to reproduce the aforementioned bugs, since it's been a few days……But I have added unicode tests for all situations I can think of, so it should be alright…
The only breaking change introduced is making
write_source_highlighting()
accept&str
, since logically we can't really format non-UTF8 error messages without significantly more effort anyway.This is currently marked draft, because I think I can find a way to get rid of the
bstr
dependency. I'm also a bit unhappy with how error positions are tracked with this approach, and would love to hear any suggestions.