Closed songshenma closed 6 years ago
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte
The above exception was the direct cause of the following exception:
不是因为文件太大的原因,你试试把文件换个编码,换成utf8或者unicode再试一试 @songshenma
我的文件本身就是UTF8编码的
emmmm,我这里不能复现你的问题,方便的话你可以截取一段会导致程序出错的文本贴一下,或者发到我的邮箱shaohao97@gmail.com
@songshenma 这个issue的问题应该和你一样,已经得到解决,你可以再试一下https://github.com/deepcs233/jieba_fast/issues/6
Building prefix dict from the default dictionary ... Loading model from cache /var/folders/8t/z__z7fgj5rnfxbvmysdv7_rw0000gn/T/jieba.cache Loading model cost 0.919 seconds. Prefix dict has been built succesfully. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/Users/macos/PycharmProjects/tensorflow/NLPTools/jiebaParallel.py", line 18, in
words = "/ ".join(jieba.lcut(content))#默认精确模式
File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 340, in lcut
return list(self.cut(*args, **kwargs))
File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 308, in cut
for word in cut_block(blk):
File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/init.py", line 273, in cut_DAG
for t in recognized:
File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/finalseg/init.py", line 97, in cut
for word in cut(blk):
File "/Users/macos/anaconda3/envs/tensorflow/lib/python3.6/site-packages/jieba_fast/finalseg/init.py", line 69, in __cut
prob, pos_list = _jieba_fast_functions._viterbi(sentence, 'BMES', start_P, trans_P, emit_P)
SystemError: returned a result with an error set
分词的文件2500多万词 22万多行 是因为太大了吗?