dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Apache License 2.0
1.36k stars 209 forks source link

How to use ViLT model for Spanish Text ? #62

Open karndeepsingh opened 2 years ago

karndeepsingh commented 2 years ago

Hi, I have Image and Description of Products which is in Spanish language and want to train a classifier model using ViLT. What kind of pretrained model shall I use to train it on my Spanish Text and Image? I assume the model shared was trained in the English language it won't help me to train on my Spanish text. Correct me If I am wrong!

Please also suggest me the best way to train the model. Shall I train the model from scratch if not then which model weight shall I use to train on my dataset.

Thanks

altaykacan commented 1 year ago

Hi,

I'm no expert on this and was just browsing the code while studying the paper but I think to be able to use the model for a different language you need to just change the word embedding. Also, the authors mention that in the paper they initialize their transformer with the weights of a pretrained ViT (no language is involved, it's just for visual features).

So just training the model from scratch with the same initial weights as ViT and changing the word embedding should work.

I know it has been some time since you asked your question but I thought it wouldn't hurt to try and answer. Hopefully it was helpful!