chatopera / Synonyms

:herb: 中文近义词:聊天机器人,智能问答工具包
https://bot.chatopera.com/
Other
5.03k stars 901 forks source link

载入vocab.txt报错 #46

Closed xiaoniuzilo closed 6 years ago

xiaoniuzilo commented 6 years ago

`>>> import synonyms

Synonyms load wordseg dict [D:\python34\lib\site-packages\synonyms\data\vocab .txt] ... Traceback (most recent call last): File "D:\python34\lib\site-packages\jieba\posseg__init__.py", line 105, in lo ad_wordtag word, , tag = line.split(" ") ValueError: too many values to unpack (expected 3)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "D:\python34\lib\site-packages\synonyms__init__.py", line 85, in <module

_tokenizer.initialize(tokenizer_dict) File "D:\python34\lib\site-packages\jieba\posseg__init.py", line 95, in ini tialize self.load_word_tag(self.tokenizer.get_dict_file()) File "D:\python34\lib\site-packages\jieba\posseg\init__.py", line 109, in lo ad_word_tag 'invalid POS dictionary entry in %s at Line %s: %s' % (f_name, lineno, line) ) ValueError: invalid POS dictionary entry in D:\python34\lib\site-packages\synony ms\data\vocab.txt at Line 333405: 福荫 1 v 2 n`

好像是jieba不支持多个词性,后来手工修改,每个词只保留一个词性,就没有这个bug了

hailiang-wang commented 6 years ago

那可以帮忙提交一下修改后的vocab.txt么?创建一个Pull Request. Thanks.

hailiang-wang commented 6 years ago

还有一个方案,可以不改变词典的方式,快速解决。一会提交。

hailiang-wang commented 6 years ago

问题在 synonyms-3.3.5中解决,请再试试。 https://pypi.python.org/pypi/synonyms/3.3.5

xiaoniuzilo commented 6 years ago

initialize dictionary: None| initialized: False Building prefix dict from the default dictionary ... Traceback (most recent call last): File "word2vec.py", line 59, in r = synonyms.compare(sen1, sen2, seg=True) File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms__init .py", line 276, in compare s1 = [x for x in jieba.cut(s1)] File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms__init .py", line 276, in s1 = [x for x in jieba.cut(s1)] File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba__ init__.py", line 304, in cut for word in cutblock(blk): File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba_ init.py", line 236, in cut_DAG DAG = self.getDAG(sentence) File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba_ init.py", line 182, in get_DAG self.check_initialized() File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba\ init__.py", line 171, in check_initialized self.initialize() File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba\ init__.py", line 145, in initialize self.FREQ, self.total = self.gen_pfdict(self.get_dict_file()) File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba\ init__.py", line 355, in get_dict_file return get_module_res(DEFAULT_DICT_NAME) File "D:\python34\lib\site-packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba_c ompat.py", line 8, in os.path.join(*res)) File "D:\python34\lib\site-packages\pkg_resources.py", line 886, in resource_s tream self, resource_name File "D:\python34\lib\site-packages\pkg_resources.py", line 1411, in get_resou rce_stream return open(self._fn(self.module_path, resource_name), 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'D:\python34\lib\site -packages\synonyms-3.3.5-py3.4.egg\synonyms\jieba\dict.txt'

jieba里面漏了词典。。。

hailiang-wang commented 6 years ago

这个字典是故意删除的,以减少体积。 我本地测试没有问题,你是怎么执行的?

xiaoniuzilo commented 6 years ago

查近义词没有问题,但是执行r=synonyms.compare(sen1, sen2, seg=True)的时候报错

hailiang-wang commented 6 years ago

image 这是我的控制台输出,您再帮忙测试一下。

@xiaoniuzilo

xiaoniuzilo commented 6 years ago

_20180307100426 _20180307100439

这是我的控制台输出。我试了3.3.5和3.3.6,还是不行

hailiang-wang commented 6 years ago

我是在Mac OSX 上测试的,python版本也是3.6,您能升级一下 python么? 您现在用的是3.4, 目前Synonyms 会兼容py2.7和py3.6。 Synonyms不会兼容到py3.4。

xiaoniuzilo commented 6 years ago

我原先的系统项目用的依赖包装的太多了,不好升级python。 后来用64位的windows(没装过python)装了python3.6.0,装完synonyms以后,import synonyms就报错,然后发现是缺少jieba(requires里面没有jieba),又装了jieba,再运行,还是原来一样的错。

这是没装jieba的时候,import即报错 _20180307112540

装完jieba,compare报错 new1 new2

您可以试一下在新的环境下安装3.3.6的synonyms运行看下会不会出错么?可能原先的环境配置、依赖包什么的已经很全了所以没有问题。。

原先的版本python 3.4是兼容的。。我先用回原先的版本

xiaoniuzilo commented 6 years ago

所以我需要放个dict.txt到这个目录下面么? 这个dict.txt去哪找呢?

xiaoniuzilo commented 6 years ago

图片老传不了,直接贴代码

Synonyms on loading stopwords [C:\PYTHON36\lib\site-packages\synonyms-3.3.7-p y3.6.egg\synonyms\data\stopwords.txt] ... Synonyms on loading vectors [C:\PYTHON36\lib\site-packages\synonyms-3.3.7-py3 .6.egg\synonyms\data\words.vector] ...

synonyms.compare('这个字典是故意删除的','以减少体积',True) 0.089 synonyms.compare('这个字典是故意删除的','以减少体积',True) 0.089 synonyms.compare('这个字典是故意删除的','可以用就好了',True) C:\PYTHON36\lib\site-packages\synonyms-3.3.7-py3.6.egg\synonyms\utils.py:246: Ru ntimeWarning: invalid value encountered in true_divide cosine = lambda a, b: dot(a, b)/(norm(a)*norm(b)) C:\PYTHON36\lib\site-packages\synonyms-3.3.7-py3.6.egg\synonyms\synonyms.py:245: RuntimeWarning: invalid value encountered in less if r < 0: r = abs(r) Traceback (most recent call last): File "", line 1, in File "C:\PYTHON36\lib\site-packages\synonyms-3.3.7-py3.6.egg\synonyms\synonyms .py", line 288, in compare return _similarity_distance(s1, s2) File "C:\PYTHON36\lib\site-packages\synonyms-3.3.7-py3.6.egg\synonyms\synonyms .py", line 245, in _similarity_distance if r < 0: r = abs(r) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

更新到3.3.7后,出现了新的错误。 synonyms.compare('这个字典是故意删除的','以减少体积',True) 返回是正确的 synonyms.compare('这个字典是故意删除的','可以用就好了',True)就报错了。 问题出在g = cosine(_flat_sum_array(_get_wv(s1)), _flat_sum_array(_get_wv(s2)))这一句上, 我debug了一下,s2分词结果是正确的,但是_get_wv(s2)返回的是[] 所以cosine函数报错。。。。。 不知道是啥问题

hailiang-wang commented 6 years ago

Fix in 3.3.10

hailiang-wang commented 6 years ago

@xiaoniuzilo 感谢使用并发布问题。