Open LoveU3tHousand2 opened 1 year ago
I'm guessing its because in regular vqvae, the quantisation happens in the encoder and the decoder simply takes the embedding and decodes it. In latent diffusion, the diffusion is done in z space, and the output is later used in the decoder. This output has to be first quantised before being decoded.
I'm guessing its because in regular vqvae, the quantisation happens in the encoder and the decoder simply takes the embedding and decodes it. In latent diffusion, the diffusion is done in z space, and the output is later used in the decoder. This output has to be first quantised before being decoded.
So it will be work too if I quantise z before ldm training and decode directly after sampling ?
I 've noticed that 'decode_to_img' function in taming-transformer and vq-vae using decode_code or get_codebook_entry, but in ldm, decode_first_stage is quantize -> decode if not set predict_cid = True, why is this? What is the difference between quantize->decode and get_codebook_entry->decode?