google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.23k stars 570 forks source link

use custom vocab.txt #214

Open 2696120622 opened 4 years ago

2696120622 commented 4 years ago

My corpous consists of pure numbers like 1, 2, ..., 1000000, ..., 1002342, .... It is differen from words in any language. Can I replace the vocab.txt with my own vocab.tx created using my corpous for fine-tuning albert? Or, should I train albert on my corpous from scratch?

Thanks.

517030910405 commented 4 years ago

Where can I get the vocab.txt ? Thanks I am finding the official vocab.txt