Closed hanchenwang closed 2 weeks ago
The latent diffusion models uses the VQModelInterface module. The method 'encode' from this class doesn't quantize, but the method 'decode' does the quantization, which is exactly what the paper described.
Thank you for the clarification. I understand.
I find the VQ layer is stacked to the end of the encoder of the first_stage AE. However, in your paper you said the VQ layer should be absorbed by the decoder instead. I am wondering if I missed anything. Otherwise, is it a conflict here or the VQ layer position is not important? Thank you!