deepcs233 / jieba_fast

Use C Api and Swig to Speed up jieba 高效的中文分词库
MIT License
632 stars 75 forks source link

分词结果与 jieba 不一致 #14

Closed purpleskyfall closed 5 years ago

purpleskyfall commented 6 years ago

当对小区名“和家欣苑”分词时,jieba 的分词结果为:

['和', '家', '欣苑']

而 jieba_fast 的分词结果为:

['和家欣苑']

>>> import jieba
>>> jieba.lcut('和家欣苑')
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 0.999 seconds.
Prefix dict has been built succesfully.
['和', '家', '欣苑']
>>> import jieba_fast
>>> jieba_fast.lcut('和家欣苑')
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 1.000 seconds.
Prefix dict has been built succesfully.
['和家欣苑']
yzho0907 commented 6 years ago

@purpleskyfall 这不是证明jieba_fast的效果更好吗?

purpleskyfall commented 6 years ago

这不是好坏问题,如果 jieba_fast 的分词成果与 jieba 不一致,在使用时就不能放心地用 jieba_fast 替代 jieba。 @yzho0907

yzho0907 commented 6 years ago

@purpleskyfall 的确,建议用大规模预料测试下。

deepcs233 commented 6 years ago

可以看看这个 issue关于红楼梦分词的测试

deepcs233 commented 5 years ago

@purpleskyfall 经测试mac下 python3.6 下的jieba_fast测试结果与jieba一致,你是什么环境?windows下我是不能保证的。如果还有问题建议完善相关信息新开一个issue。