fredericky123 commented 2 years ago

我下载下来后，使用如下语句指定训练好的模型，py运行却没有任何反应 model = gensim.models.KeyedVectors.load_word2vec_format('/text/sgns.financial.bigram-char') 而换为另一个混合类的模型，就能正常运行 model = gensim.models.KeyedVectors.load_word2vec_format('/text/merge_sgns_bigram_char300.txt') 这是为什么呢？是不是第一个的格式不对？还是需要另外的语句读取model? 谢谢呀！

stay-leave commented 2 years ago

我用的这个 def weight(self,vocab_to_index):

将词映射为预训练词向量

    size_vocab = len(vocab_to_index)#字典大小
    embeddings = np.zeros((size_vocab, 300))#初始化数组 为零，300维
    found=0#匹配到的词向量个数
    with open(r'..\datasets\sgns.weibo.char','r',encoding='utf-8') as f:#读取预训练词向量文件
        for line_idx, line in enumerate(f):#遍历索引和值，值格式为：词，词向量
            line = line.strip().split()#值
            if len(line) != 300 + 1:#保证每个词向量为300维
                continue
            word = line[0]#词
            embedding = line[1:]#词向量
            if word in vocab_to_index:
                found=found+1#加一
                word_idx = vocab_to_index[word]#找到对应索引
                embeddings[word_idx] = embedding#该索引位置对应词向量
        print('获取到的词向量：'+str(found)+'所有的词：'+str(size_vocab)+'匹配率：{:.2f}%'.format(found/size_vocab*100))
        # 保存提取到的词向量数组
        np.savez_compressed(r'..\datasets\vec.npz', embeddings=embeddings)
        #return embeddings

HunterHeidy commented 2 years ago

你好，谢谢你的来信，祝你生活愉快，身体健康。

Embedding / Chinese-Word-Vectors

如何读取sgns.financial.bigram-char #149

将词映射为预训练词向量