I have trained a new model following the guildeline in README.md. The model was trained on my own dataset of scientific articles. Now, in order to use the trained model, I need a tokenizer. Which one should I use? Do I need to load the vocabulary from disk in case the vocabulary used during training is different from pretrained ones?
I have trained a new model following the guildeline in
README.md
. The model was trained on my own dataset of scientific articles. Now, in order to use the trained model, I need a tokenizer. Which one should I use? Do I need to load the vocabulary from disk in case the vocabulary used during training is different from pretrained ones?