CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.16k stars 1.46k forks source link

Vector quantization #374

Closed hanchenwang closed 2 weeks ago

hanchenwang commented 2 weeks ago

I find the VQ layer is stacked to the end of the encoder of the first_stage AE. However, in your paper you said the VQ layer should be absorbed by the decoder instead. I am wondering if I missed anything. Otherwise, is it a conflict here or the VQ layer position is not important? Thank you!

NITHISHM2410 commented 2 weeks ago

The latent diffusion models uses the VQModelInterface module. The method 'encode' from this class doesn't quantize, but the method 'decode' does the quantization, which is exactly what the paper described.

hanchenwang commented 2 weeks ago

Thank you for the clarification. I understand.