CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.42k stars 1.49k forks source link

About decode_first_stage in sampling #218

Open LoveU3tHousand2 opened 1 year ago

LoveU3tHousand2 commented 1 year ago

I 've noticed that 'decode_to_img' function in taming-transformer and vq-vae using decode_code or get_codebook_entry, but in ldm, decode_first_stage is quantize -> decode if not set predict_cid = True, why is this? What is the difference between quantize->decode and get_codebook_entry->decode?

Yoonho-Na commented 1 year ago

In ldm paper, the author mentioned that

This model can be interpreted as a VQGAN [23] but with the quantization layer absorbed by the decoder.

I'm not really sure about this but maybe operating quantization method is little different with VQGAN I guess.

ryx19th commented 1 month ago

I think it's because there is a VQModelInterface wrapper, and the decode func there performs the codebook lookup before final decoding.