OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.43k stars 248 forks source link

Effect of VQGAN code randomness #301

Open varadgunjal opened 2 years ago

varadgunjal commented 2 years ago

I understand from #258 that there is randomness in the generated VQGAN code sequences because of Gumbel Softmax, but the different sequences nevertheless reconstruct to similar looking images. However, since the training is done by predicting the sequence tokens and not by comparing the reconstructed images themselves, I am wondering if and how having different token sequences will affect the pretraining and downstream performance? Was this something that had been investigated to check for consistency in performance across different variations of the generated code sequences?

jxst539246 commented 2 years ago

A good question. In our preliminary experiments, we found that using different sequences can slightly improve the model performance, it seems that the randomness in the vqgan encoding process becomes some data augments or label smoothing. But we didn't conduct a more in-depth quantitative study.

varadgunjal commented 2 years ago

I see. So what you're saying is that there is some value in using multiple (slightly different) sequences representing the same image and this could be interpreted as data augmentation on the sequences used for the Image Infilling task. Interesting take. I would like to try and explore this further.