google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
37.82k stars 9.56k forks source link

use custom vocab.txt #1092

Open 2696120622 opened 4 years ago

2696120622 commented 4 years ago

My corpous consists of pure numbers like 1, 2, ..., 1000000, ..., 1002342, .... It is differen from words in any language. Can I replace the vocab.txt with my own vocab.tx created using my corpous for fine-tuning bert? Or, should I train bert on my corpous from scratch?

Thanks.

Crescentz commented 3 years ago

the same question. IF i want to use bert in a special field, but [unused]s are not enough, how to solve it? use custom vocab.txt