KWARC / llamapun

common language and mathematics processing algorithms, in Rust
https://kwarc.info/systems/llamapun/
GNU General Public License v3.0
25 stars 6 forks source link

fix for panic in document tokenizer #6

Closed urabenstein closed 6 years ago

urabenstein commented 7 years ago

Workaround to prevent panics occurring because byte indices do not point to char boundaries.

dginev commented 6 years ago

Hey @urabenstein hope you're still around and seeing notifications! Apologies for not getting to this pull request earlier, I believe I had postponed the pattern matcher work until Frederik is back around.

Thanks for the pull request - I ran into some malformed unicode panics myself just now #10 and will try to incorporate a version of your upgrades.

dginev commented 6 years ago

Would have loved to have a test btw, since now I'm a little uncertain if I will patch the correct issues.

dginev commented 6 years ago

Fixed via #eb16446 , just doubled down on using .len_utf8 when doing char arithmetic

dginev commented 6 years ago

Thanks for the PR, helped a lot @urabenstein !