分词问题 - Githubissues

What steps will reproduce the problem?
1. IKSegmentation ikSeg = new IKSegmentation(new StringReader(testString) , 
true);
2. 把这个句子进行分词"一一列举,一一对应"

What is the expected output? What do you see instead?
我认为，expected output是：
0-4 : 一一列举 :    CJK_NORMAL
0-2 : 一一 :  NUMEBER
0-1 : 一 :     UNKNOWN
1-3 : 一列 :  CJK_NORMAL
2-4 : 列举 :  CJK_NORMAL
2-3 : 列 :     COUNT
5-9 : 一一对应 :    CJK_NORMAL
5-7 : 一一 :  NUMEBER
5-6 : 一 :     UNKNOWN
6-8 : 一对 :  CJK_NORMAL
7-9 : 对应 :  CJK_NORMAL
但是输出结果为：
0-2 : 一一 :  NUMEBER
0-1 : 一 :     UNKNOWN
1-3 : 一列 :  CJK_NORMAL
2-4 : 列举 :  CJK_NORMAL
2-3 : 列 :     COUNT
5-9 : 一一对应 :    CJK_NORMAL
5-7 : 一一 :  NUMEBER
5-6 : 一 :     UNKNOWN
6-8 : 一对 :  CJK_NORMAL
7-9 : 对应 :  CJK_NORMAL

What version of the product are you using? On what operating system?
我是实用win7旗舰英文版（64位），jdk1.6_u21（64位），eclipse3.4.
1（默认编码是GBK），调试模式和运行模式（eclipse工程下）都
出现同样问题。

Please provide any additional information below.

我觉得问题是UTF-8文件头的前导信息导致主词典文件的第一个
词被破坏（一一列举是主词典的第一个词）。

Original issue reported on code.google.com by dengwe...@qq.com on 7 Apr 2011 at 10:59

dannyxu2015 / ik-analyzer

分词问题 #26