Embedding / Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量
Apache License 2.0
11.74k stars 2.31k forks source link

如何加载模型 #162

Open YiingWei opened 1 year ago

YiingWei commented 1 year ago

作者你好,当我用下面的代码尝试加载您的中文词向量模型

加载中英文词向量模型

ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) 结果显示下面报错,应该如何解决呢 Traceback (most recent call last): File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in load_word2vec_format return _load_word2vec_format( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 2065, in _load_word2vec_format _word2vec_read_binary( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1960, in _word2vec_read_binary processed_words, chunk = _add_bytes_to_kv( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1939, in _add_bytes_to_kv word = chunk[start:i_space].decode(encoding, errors=unicode_errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 0: invalid start byte

HunterHeidy commented 1 year ago

你好,谢谢你的来信,祝你生活愉快,身体健康。

XXXXiGua commented 1 year ago

应该binary=False,因为模型是txt格式十进制的,如果是bin采用True

ahutxwq1 commented 9 months ago

作者你好,当我用下面的代码尝试加载您的中文词向量模型

加载中英文词向量模型

ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) 结果显示下面报错,应该如何解决呢 Traceback (most recent call last): File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in load_word2vec_format return _load_word2vec_format( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 2065, in _load_word2vec_format _word2vec_read_binary( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1960, in _word2vec_read_binary processed_words, chunk = _add_bytes_to_kv( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1939, in _add_bytes_to_kv word = chunk[start:i_space].decode(encoding, errors=unicode_errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 0: invalid start byte

merge_sgns_bigram_char300.txt这个文件怎么生成的?可以直接下载吗?

GGbond2004 commented 1 month ago

你好请问解决了吗。在代码里应该怎么修改?

HunterHeidy commented 1 month ago

你好,谢谢你的来信,祝你生活愉快,身体健康。