Using trained model: which tokenizer?

allenai / specter

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Apache License 2.0

508 stars 55 forks source link

Using trained model: which tokenizer? #38

Open mle-els opened 2 years ago

mle-els commented 2 years ago

I have trained a new model following the guildeline in README.md. The model was trained on my own dataset of scientific articles. Now, in order to use the trained model, I need a tokenizer. Which one should I use? Do I need to load the vocabulary from disk in case the vocabulary used during training is different from pretrained ones?