Train Code for VAE Used in Paper

Ferry1231 commented 3 months ago

Dear researcher,I have been reading your team's paper, and I found it incredibly insightful and was inspired to attempt a reproduction of the work. Considering the limited resources available in my lab and from a learning perspective, I plan to start by training the model on smaller datasets like CIFAR-10. However, I've encountered some difficulties while using the VAE encoder and couldn't find a VAE model that fits well with it.

Do you have the train code of VAE used in the paper? Another question, what does "vae_stride" param mean?

Thank you and thank for your works.

LTH14 commented 3 months ago

Thanks for your interest! We follow the VAE training from VQGAN and LDM. Please use this codebase and follow this config.

LTH14 commented 3 months ago

You need to copy the AutoencoderKL class to this file in the VQGAN codebase.

Ferry1231 commented 3 months ago

Thank you! I got it.

gzhuinjune commented 3 months ago

您好，请问我当时在跑rcg的时候训练过的vqgan也可以直接在这里用对吗，vae_ckpt就是vqgan对吗，请问还改了别的细节吗，我依旧想换平面图的数据集，谢谢您的耐心回答，希望您能把改动的细节都告诉我，我害怕对不上，之前的rgb的范围以及数据增强之类的应该如何设置呢，比起官方代码还有别的改动吗，谢谢！！！

gzhuinjune commented 3 months ago

很遗憾之前的rcg我一直没有跑出来理想的结果，这个看起来组成部分要少很多，谢谢您的帮助

LTH14 commented 3 months ago

@gzhuinjune A major difference here is the vae in this paper does not rely on the "quantization" step in vqgan. Of course, this framework can also use vq-based tokenizer, but a non-vq tokenizer should work better. You can start with the commonly used non-vq tokenizer like the one below:

from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")

LTH14 commented 3 months ago

@gzhuinjune 不能用这个，因为这个是在imagenet上训练的。可以用我上面说的Stable Diffusion用的VAE，他们是在openimage上训练的，通用性好很多，可以先在你的数据集上试试reconstruction效果。当然，如果performance不好的话，那还是得先在你自己的数据集上训练一个vae

gzhuinjune commented 3 months ago

您在上面引用了vqgan.py，请问这个是在哪里用呀，应该不是把AutoencoderKL放这里面吧，抱歉我还是没有明白用vqgan里面的哪个代码来训练。您提到的sd的vae是这个对吗。

gzhuinjune commented 3 months ago

vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")是写在哪个文件里面呀，是main.py里面对吗，然后复制一个类到vqgan里面，直接用vqgan官网的这个定制数据集的来训练对吗

gzhuinjune commented 3 months ago

请问openimage的预训练权重是这个吗

gzhuinjune commented 3 months ago

是将AutoencoderKL类复制到了vqgan，为啥这里写得是from diffusers.models import AutoencoderKL，谢谢

LTH14 commented 3 months ago

from diffusers.models import AutoencoderKL让你可以直接使用stable diffusion训练好的vae。但如果你需要自己训练，那就需要把AutoencoderKL（https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/autoencoder.py#L285）复制到vqgan的codebase里进行训练。

gzhuinjune commented 3 months ago

您好，请问我把AutoencoderKL类复制到vqgan里面替换哪个类呢，只替换进去应该没有被调用吧，我还是接着用main.py训练自己数据集的那个脚本示例对吗，然后如何在main.py里面调用它呢，谢谢您对初学者的耐心解答

LTH14 commented 3 months ago

复制到taming/models/vqgan.py里，然后用这个config。需要把这个config里面的ldm路径改成vqgan里的路径

gzhuinjune commented 3 months ago

谢谢大帅哥，祝你一切顺利

gzhuinjune commented 3 months ago

我明白了

fengyang0317 commented 1 month ago

Is there any reason for using taming-transformers instead of latent-diffusion or stable-diffusion codebase?

LTH14 commented 1 month ago

@fengyang0317 not really -- I just chose one.

fengyang0317 commented 1 month ago

I see. Thank you so much.

LTH14 / mar

Train Code for VAE Used in Paper #19