dongrixinyu / JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
http://www.jionlp.com/
Apache License 2.0
3.34k stars 404 forks source link

数据增强:同音字替换bug #17

Closed zjhiphop closed 3 years ago

zjhiphop commented 3 years ago

请输入您的问题描述,或您预期的功能 please describe the bug or the function you expect

请输入报错的文本,以及代码 please input the text and code

jio.homophone_substitution("北京市")

请输入报错信息与日志追踪 please input the bug info and traceback

Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/ks/vz0z2zk13hx0t6h_pgy1bpfh0000gn/T/jieba.cache
Loading model cost 0.602 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-703a6a7940fb>", line 1, in <module>
    runfile('/Users/mrx/Documents/work/lance/gov_nlp/repo/legal_instrument/corpus/augement.py', wdir='/Users/mrx/Documents/work/lance/gov_nlp/repo/legal_instrument/corpus')
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/mrx/Documents/work/lance/gov_nlp/repo/legal_instrument/corpus/augement.py", line 217, in <module>
    jio.homophone_substitution('北京市')
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/textaug/homophone_substitution.py", line 108, in __call__
    self._prepare(homo_ratio=homo_ratio, seed=seed)
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/textaug/homophone_substitution.py", line 68, in _prepare
    self._construct_word_pinyin_dict()
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/textaug/homophone_substitution.py", line 80, in _construct_word_pinyin_dict
    word_pinyin = self.pinyin(word, formater='detail')
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/gadget/pinyin.py", line 164, in __call__
    self._prepare()
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/gadget/pinyin.py", line 79, in _prepare
    self.pinyin_char = pinyin_char_loader()
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/dictionary/dictionary_loader.py", line 424, in pinyin_char_loader
    char_dict = chinese_char_dictionary_loader()
  File "/Users/mrx/anaconda3/lib/python3.7/site-packages/jionlp/dictionary/dictionary_loader.py", line 245, in chinese_char_dictionary_loader
    assert len(segs) == 8
AssertionError

版本信息: jionlp==1.3.15

其他问题: word_distribution.zip 这个文件没有包含解压后的文本, 需要手动解压才可以

dongrixinyu commented 3 years ago

我更新了一下汉语字典。可能你是之前用的老版的代码,然后在那个基础上用了新版的,也就是新旧两版有干扰,报了错。你重新安装一下,应该是没问题的。