buxiangzhiren / Asymmetric_VQGAN

MIT License
222 stars 9 forks source link

KL-reg and VQ-reg autoencoder #9

Closed betterze closed 9 months ago

betterze commented 9 months ago

Dear Buxiangzhiren,

Thank you for sharing this great repo, I really enjoy your work.

If I understand correctly, according to section 3.1 in LDM, KL-reg and VQ-reg are two different methods to regularize the autoencoder. According to section 4, the author uses a VQ-reg autoencoder.

Your method improves the VQ-reg autoencoder. But in the readme, you mention you use the KL-regularized autoencoder. The KL and VQ autoencoders are not the same thing, right? In the diffuser code base for SD, they also use KL autoencoder. Could you clarify this?

Thank you for your help.

Best Wishes,

Zongze

buxiangzhiren commented 9 months ago

The original latent diffusion model utilized VQ (Vector Quantization) regularization. In contrast, the updated version of Stable Diffusion employs KL (Kullback-Leibler) regularization. Both VQ and KL autoencoders differ mainly in their quantization methods, yet share the same decoding mechanism. Therefore, for the purposes of our methodology, they are functionally equivalent.

We first use the original latent diffusion model as our baseline based on the VQ-reg. But the SD has a better performance, so we final use the SD based on KL-reg as our baseline.

betterze commented 9 months ago

Thx a lot for your explanation. I really appreciate it.