Closed spraynasal closed 8 years ago
Thank you!
Actually, this interpretation is not correct; I'll revert the code back. Please see our unit test EnglishTokenizerOffsetTest.
You are absolutely right, I missed the unit test ! I've come across some edge cases I was trying to solve, while this fixes some of these cases it also breaks some that worked before, I'll review everything then submit another pull request !
Invalid bIndex generated invalid start and end offsets in subsequent addSymbols tokenization.