A question about tokenization

Dear the Development Team,

I run the code on the best model and find out that the tokens of the sentence seems to be different from other tokenizers, especially for those words with puncts.

For instancs, sentence "Not only was the food outstanding, but the little 'perks' were great.", the tokens are:

["Not","only","was","the","food","outstanding",",","but","the","little","'perks","›","were","great","."]

The word 'perks' is tokenized as 'perks and ›.

So I was wondering if I could alter the tokenizer of this parser to other methods like Stanza?

Thank you

KhalilMrini / LAL-Parser

A question about tokenization #27