dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 209 forks source link

fine-tuning ViLT for MLM task with a new dataset #79

Open Ellyuca opened 1 year ago

Ellyuca commented 1 year ago

Hi. Thanks for providing the code to such a great work. I am new to language models and I apologize for maybe asking trivial questions.

I am wondering if it is possible to fine-tune the model for MLM on a new/different dataset. Basically I want to have a model that can predict the [MASK] specific to a certain dataset (with custom text and images). Could you please share how to do this?

Thanks in advance for your time and help. Best regards.