PolMine / bignlp

Tools to process large corpora line-by-line and in parallel mode
1 stars 1 forks source link

Unknown untokenizable character #40

Closed ablaette closed 1 year ago

ablaette commented 1 year ago

[main] WARN edu.stanford.nlp.process.PTBLexer - Untokenizable: ‬ (U+202C, decimal: 8236)