Open GoogleCodeExporter opened 9 years ago
The tokenizer will sometimes create tokens like "berlusconi'", which isn't right. Need to also look for other bad tokenizations.
Original issue reported on code.google.com by lar...@gmail.com on 22 Feb 2011 at 8:47
lar...@gmail.com
"Ukraine's" becomes "ukraine", but "Ukraine" becomes "ukrain".
Original comment by lar...@gmail.com on 25 Feb 2011 at 1:24
Original issue reported on code.google.com by
lar...@gmail.com
on 22 Feb 2011 at 8:47