Open chenhaot opened 11 years ago
Hm. Do you have other examples? Is it always with the double underscore?
Please send us a pull request with your fix if you can. To test a fix to the tokenizer, what we do is run the old and new version on 100,000 tweets, then look at the differences if any.
for instance, pls RTTell will be parsed to pls R TT ell
I have an ad-hoc fix for now. It seems OK to me.