ZhuiyiTechnology / simbert

a bert for retrieval and generation
Apache License 2.0
840 stars 152 forks source link

vocab.txt如何生成?vocab_size为什么发生变化? #6

Open wyqnumber opened 4 years ago

wyqnumber commented 4 years ago

通过chinese_L-12_H-768_A-12模型训练生成simbert模型中的vocab.txt发生了变化,词的内容和数量都不同了,新simbert模型中的vocab.txt如何生成?

wyqnumber commented 4 years ago

keep_tokens=keep_tokens, # 只保留keep_tokens中的字,精简原字表

sssdjj commented 4 years ago

怎样保存精简词表呢

sssdjj commented 4 years ago

path = open("test/vocab.txt","w+")

for i in token_dict.keys(): path.write(i+"\n") path.close()

lonngxiang commented 3 years ago

也感觉遇到类似问题,预训练后加载模型预测报错,不知道什么原因产生 image