Closed david-waterworth closed 3 years ago
I've been diving into the sentence piece tokenizer as it's not behaving as I expected. In the process, I noticed that the following results in a panic
fn main() { let string = "banana$band$$"; let suffix = esaxx_rs::suffix_rs(&string).unwrap(); for (chars,freq) in suffix.iter() { println!("(chars, freq): {:?}, {:?}", chars, freq); } }
It seems to be due to the repeated $ at the end, I don't think there's anything special with $ (i.e you can replace it with \0).
The original C++ code doesn't throw an exception for the same input.
I've been diving into the sentence piece tokenizer as it's not behaving as I expected. In the process, I noticed that the following results in a panic
It seems to be due to the repeated $ at the end, I don't think there's anything special with $ (i.e you can replace it with \0).
The original C++ code doesn't throw an exception for the same input.