Adding new custom tokens in vocab during the fine-tuning pre-trained model

google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Apache License 2.0

3.24k stars 569 forks source link

Adding new custom tokens in vocab during the fine-tuning pre-trained model #122

Closed igeti closed 4 years ago

igeti commented 4 years ago

I want to add new tokens to the dictionary when fine-tuning the pre-trained Albert model. For example, in Bert, we can insert new custom tokens into the dictionary in place of [unknown] tokens: [PAD] [unused0] [unused1] [unused2] [unused3] [unused4] ... But in Albert, I did not find such an opportunity, since the dictionary(30k-clean.vocab) is completely filled.

0x0539 commented 4 years ago

See https://github.com/google-research/ALBERT/issues/127#issuecomment-581869983 for how to do this.