BrikerMan / Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
http://kashgari.readthedocs.io/
Apache License 2.0
2.4k stars 441 forks source link

[Question] 您好,kashgari可以预训练自己的语料库,然后调用这个embedding吗? #134

Closed Ted8000 closed 5 years ago

Ted8000 commented 5 years ago

A clear and concise description of what you want to know. 针对您之前的一个回答,想问一下,就是如何训练自己的语料库,调用自己的embedding层,可以提供一个例子吗? image

BrikerMan commented 5 years ago

可以先使用 gensim / fasttext 之类的框架训练自己的词向量,然后用 WordEmbedding 模块加载此词向量作为语言模型。

Ted8000 commented 5 years ago

如何加载gensim 训练好的词向量呢,这个txt是我用gensim训练好的。怎么导入到wordembedding image

BrikerMan commented 5 years ago

你是用 tf.keras 版本吧?tf.keras 版本代码如下

import kashgari
from kashgari.embedding import WordEmbedding

# 序列标注
em = WordEmbedding('xxx.txt', task=kashgari.LABELING)
# 分类
em = WordEmbedding('xxx.txt', task=kashgari.CLASSIFICATION)

以后麻烦不要用图片,直接分别复制代码和错误。这样方便别人检索。