Narsil / esaxx-rs

Bindings to copy of SentencePiece esaxx library (fast suffix array and frequent substrings).
Apache License 2.0
2 stars 14 forks source link

index out of bounds panic #2

Closed david-waterworth closed 3 years ago

david-waterworth commented 3 years ago

I've been diving into the sentence piece tokenizer as it's not behaving as I expected. In the process, I noticed that the following results in a panic

fn main() {
    let string = "banana$band$$";
    let suffix = esaxx_rs::suffix_rs(&string).unwrap();

    for (chars,freq) in suffix.iter() {
        println!("(chars, freq): {:?}, {:?}", chars, freq);
    }
}

It seems to be due to the repeated $ at the end, I don't think there's anything special with $ (i.e you can replace it with \0).

The original C++ code doesn't throw an exception for the same input.