分词结果与 jieba 不一致

purpleskyfall commented 6 years ago

当对小区名“和家欣苑”分词时，jieba 的分词结果为：

['和', '家', '欣苑']

而 jieba_fast 的分词结果为：

['和家欣苑']

>>> import jieba
>>> jieba.lcut('和家欣苑')
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 0.999 seconds.
Prefix dict has been built succesfully.
['和', '家', '欣苑']
>>> import jieba_fast
>>> jieba_fast.lcut('和家欣苑')
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 1.000 seconds.
Prefix dict has been built succesfully.
['和家欣苑']

yzho0907 commented 6 years ago

@purpleskyfall 这不是证明jieba_fast的效果更好吗？

purpleskyfall commented 6 years ago

这不是好坏问题，如果 jieba_fast 的分词成果与 jieba 不一致，在使用时就不能放心地用 jieba_fast 替代 jieba。 @yzho0907

yzho0907 commented 6 years ago

@purpleskyfall 的确，建议用大规模预料测试下。

deepcs233 commented 6 years ago

可以看看这个 issue关于红楼梦分词的测试

deepcs233 commented 5 years ago

@purpleskyfall 经测试mac下 python3.6 下的jieba_fast测试结果与jieba一致，你是什么环境？windows下我是不能保证的。如果还有问题建议完善相关信息新开一个issue。

deepcs233 / jieba_fast

分词结果与 jieba 不一致 #14