预训练词向量的问题

GaoQ1 / rasa_chatbot_cn

building a chinese dialogue system based on the newest version of rasa(基于最新版本rasa搭建的对话系统)

960 stars 290 forks source link

预训练词向量的问题 #46

Closed gcong66 closed 5 years ago

gcong66 commented 5 years ago

之前看到config.yml文件配置有：

language: "zh"
pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor.dat"
- name: "tokenizer_jieba"

然后你在百度网盘提供了total_word_feature_extractor.dat的下载，我下来发现只有40几M，请问你提供的这个total_word_feature_extractor.dat是英文的吗？因为我看到你简书博客里提到的中文total_word_feature_extractor.dat有300多M

GaoQ1 commented 5 years ago

不是，是中文的

gcong66 commented 5 years ago

那这个40几兆的是用什么语料训练得到的？

GaoQ1 commented 5 years ago

http://www.crownpku.com/2017/07/27/%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html 这篇文章中作者提供的Mitie模型