allenai / specter

SPECTER: Document-level Representation Learning using Citation-informed Transformers
Apache License 2.0
508 stars 55 forks source link

Using trained model: which tokenizer? #38

Open mle-els opened 2 years ago

mle-els commented 2 years ago

I have trained a new model following the guildeline in README.md. The model was trained on my own dataset of scientific articles. Now, in order to use the trained model, I need a tokenizer. Which one should I use? Do I need to load the vocabulary from disk in case the vocabulary used during training is different from pretrained ones?