Which parameters are trainable? Are the encoder and decoder in VQGAN fixed? Is the llama fixed?

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

https://arxiv.org/abs/2406.06525

MIT License

1.26k stars 52 forks source link

Which parameters are trainable? Are the encoder and decoder in VQGAN fixed? Is the llama fixed? #26

Open tanshuai0219 opened 4 months ago

daiyixiang666 commented 4 months ago

If you only train the VQGAN, then obviously the VQGAN are trainable. If you train the GPT for the image generation, then you only need to trained the GPT model if your image dataset domian is in the range of imagenet