deepcs233 / jieba_fast

Use C Api and Swig to Speed up jieba 高效的中文分词库
MIT License
631 stars 75 forks source link

运行报错 #5

Closed songshenma closed 6 years ago

songshenma commented 6 years ago

Building prefix dict from the default dictionary ... Loading model from cache /var/folders/8t/z__z7fgj5rnfxbvmysdv7_rw0000gn/T/jieba.cache Loading model cost 0.919 seconds. Prefix dict has been built succesfully. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/macos/PycharmProjects/tensorflow/NLPTools/jiebaParallel.py", line 18, in words = "/ ".join(jieba.lcut(content))#默认精确模式 File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 340, in lcut return list(self.cut(*args, **kwargs)) File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 308, in cut for word in cut_block(blk): File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 273, in cut_DAG for t in recognized: File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/finalseg/init.py", line 97, in cut for word in cut(blk): File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/finalseg/init.py", line 69, in __cut prob, pos_list = _jieba_fast_functions._viterbi(sentence, 'BMES', start_P, trans_P, emit_P) SystemError: returned a result with an error set

分词的文件2500多万词 22万多行 是因为太大了吗?

deepcs233 commented 6 years ago

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

The above exception was the direct cause of the following exception:

不是因为文件太大的原因,你试试把文件换个编码,换成utf8或者unicode再试一试 @songshenma

songshenma commented 6 years ago

我的文件本身就是UTF8编码的

songshenma commented 6 years ago
2018-03-01 15 54 10
deepcs233 commented 6 years ago

emmmm,我这里不能复现你的问题,方便的话你可以截取一段会导致程序出错的文本贴一下,或者发到我的邮箱shaohao97@gmail.com

deepcs233 commented 6 years ago

@songshenma 这个issue的问题应该和你一样,已经得到解决,你可以再试一下https://github.com/deepcs233/jieba_fast/issues/6