Tokenizer model - Githubissues

facebookresearch / bio-lm

We evaluate many models used for biomedical and clinical nlp tasks, and train new models that perform much better.

Other

157 stars 24 forks source link

Closed nooralahzadeh closed 3 years ago

nooralahzadeh commented 3 years ago

Hi, Thanks for sharing the pre-trianed models. I wonder if there is a possibility to share the tokenizer models? Thanks

patrick-s-h-lewis commented 3 years ago

The tokenizers are available along with the models. You should be able to load the tokenizers by downloading a model that uses that tokenizer:

e.g.

tokenizer = RobertaTokenizerFast.from_pretrained(
          model_path,
          add_prefix_space=False
)