Open GoogleCodeExporter opened 8 years ago
It is necessary to check whether there is not a problem in some phases.
1. DictionaryBuilder and Preprocessor
2. Viterbi
3. TrieBuilder and TrieSearcher
4. StreamFilter(ex. CompositeTokenFilter...)
Original comment by johtani
on 8 Jul 2011 at 2:39
This sounds bad, can we come up with some any tests? With some tests, it should
be easy to fix the issue.
We should never split high/low surrogate characters ever.
Original comment by rcm...@gmail.com
on 25 Oct 2011 at 1:57
Promptly, one test was written. (src/test/net/java/sen/SurrogatesPairTest.java)
Expected token's cost may differ from the value actually outputted.
Original comment by johtani
on 31 Oct 2011 at 10:52
Attachments:
Original issue reported on code.google.com by
johtani
on 5 Jul 2011 at 9:47