vocab.txt如何生成？vocab_size为什么发生变化？

ZhuiyiTechnology / simbert

a bert for retrieval and generation

Apache License 2.0

840 stars 152 forks source link

Open wyqnumber opened 4 years ago

wyqnumber commented 4 years ago

通过chinese_L-12_H-768_A-12模型训练生成simbert模型中的vocab.txt发生了变化，词的内容和数量都不同了，新simbert模型中的vocab.txt如何生成？

wyqnumber commented 4 years ago

keep_tokens=keep_tokens, # 只保留keep_tokens中的字，精简原字表

sssdjj commented 4 years ago

怎样保存精简词表呢

sssdjj commented 4 years ago

path = open("test/vocab.txt","w+")

for i in token_dict.keys(): path.write(i+"\n") path.close()

lonngxiang commented 3 years ago

也感觉遇到类似问题，预训练后加载模型预测报错，不知道什么原因产生