KhalilMrini / LAL-Parser

Neural Adobe-UCSD Parser, the current State of the Art in Constituency and Dependency Parsing.
138 stars 24 forks source link

A question about tokenization #27

Open luryZhu opened 1 year ago

luryZhu commented 1 year ago

Dear the Development Team,

I run the code on the best model and find out that the tokens of the sentence seems to be different from other tokenizers, especially for those words with puncts.

For instancs, sentence "Not only was the food outstanding, but the little 'perks' were great.", the tokens are:

["Not","only","was","the","food","outstanding",",","but","the","little","'perks","›","were","great","."]

The word 'perks' is tokenized as 'perks and .

So I was wondering if I could alter the tokenizer of this parser to other methods like Stanza?

Thank you