OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.4k stars 247 forks source link

Effect of VQGAN code randomness #301

Open varadgunjal opened 1 year ago

varadgunjal commented 1 year ago

I understand from #258 that there is randomness in the generated VQGAN code sequences because of Gumbel Softmax, but the different sequences nevertheless reconstruct to similar looking images. However, since the training is done by predicting the sequence tokens and not by comparing the reconstructed images themselves, I am wondering if and how having different token sequences will affect the pretraining and downstream performance? Was this something that had been investigated to check for consistency in performance across different variations of the generated code sequences?

jxst539246 commented 1 year ago

A good question. In our preliminary experiments, we found that using different sequences can slightly improve the model performance, it seems that the randomness in the vqgan encoding process becomes some data augments or label smoothing. But we didn't conduct a more in-depth quantitative study.

varadgunjal commented 1 year ago

I see. So what you're saying is that there is some value in using multiple (slightly different) sequences representing the same image and this could be interpreted as data augmentation on the sequences used for the Image Infilling task. Interesting take. I would like to try and explore this further.