Closed zheedong closed 11 months ago
We freeze OPT 2.7B and train the projection layer, which takes the discrete representations of VQ-VAE and Beit v2 as inputs, using caption loss. Specifically, the discrete representations are fed into the project layer as the inputs of OPT 2.7B.
Hi.
I would like to know more details about 'pilot experiments' in SEED v1.
"We conduct two experiments to respectively align discrete representations of VQ-VAE and Beit v2 with OPT2.7B [19] model on CC3M [20] dataset."
In here, what do you mean about 'align'? Do you finetuning OPT 2.7B with VQ-VQE tokens? Or you mean any adaptor between VQ-VAE and OPT? It is great if you tell me more about details.
Thank you.