Question: Tokenizer used for RoBERTa model?

airsplay / vokenization

PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"

MIT License

186 stars 22 forks source link

Closed FelixLabelle closed 3 years ago

FelixLabelle commented 3 years ago

I'm trying to integrate Vokenization with BERTScore and I'd like to get clarification on which tokenizer is being used for the pretrained RoBERTa + VLM model. Is it roberta-base or bert-base-uncased?

airsplay commented 3 years ago

Thanks. I am using bert-base-uncased for BERT + VLM (on Wiki) here, and use roberta-base for RoBERTa + VLM (on Wiki).

FelixLabelle commented 3 years ago

Perfect, thanks for the quick reply!