How to use ViLT model for Spanish Text ?

dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Apache License 2.0

1.41k stars 208 forks source link

Hi,

I'm no expert on this and was just browsing the code while studying the paper but I think to be able to use the model for a different language you need to just change the word embedding. Also, the authors mention that in the paper they initialize their transformer with the weights of a pretrained ViT (no language is involved, it's just for visual features).

So just training the model from scratch with the same initial weights as ViT and changing the word embedding should work.

I know it has been some time since you asked your question but I thought it wouldn't hurt to try and answer. Hopefully it was helpful!

dandelin / ViLT

How to use ViLT model for Spanish Text ? #62