LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
MIT License
1.08k stars 61 forks source link

Train Code for VAE Used in Paper #19

Open Ferry1231 opened 3 months ago

Ferry1231 commented 3 months ago

Dear researcher,I have been reading your team's paper, and I found it incredibly insightful and was inspired to attempt a reproduction of the work. Considering the limited resources available in my lab and from a learning perspective, I plan to start by training the model on smaller datasets like CIFAR-10. However, I've encountered some difficulties while using the VAE encoder and couldn't find a VAE model that fits well with it.

Do you have the train code of VAE used in the paper? Another question, what does "vae_stride" param mean?

Thank you and thank for your works.

LTH14 commented 3 months ago

Thanks for your interest! We follow the VAE training from VQGAN and LDM. Please use this codebase and follow this config.

LTH14 commented 3 months ago

You need to copy the AutoencoderKL class to this file in the VQGAN codebase.

Ferry1231 commented 3 months ago

Thank you! I got it.

gzhuinjune commented 3 months ago

您好,请问我当时在跑rcg的时候训练过的vqgan也可以直接在这里用对吗,vae_ckpt就是vqgan对吗,请问还改了别的细节吗,我依旧想换平面图的数据集,谢谢您的耐心回答,希望您能把改动的细节都告诉我,我害怕对不上,之前的rgb的范围以及数据增强之类的应该如何设置呢,比起官方代码还有别的改动吗,谢谢!!!

gzhuinjune commented 3 months ago

很遗憾之前的rcg我一直没有跑出来理想的结果,这个看起来组成部分要少很多,谢谢您的帮助

LTH14 commented 3 months ago

@gzhuinjune A major difference here is the vae in this paper does not rely on the "quantization" step in vqgan. Of course, this framework can also use vq-based tokenizer, but a non-vq tokenizer should work better. You can start with the commonly used non-vq tokenizer like the one below:

from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")
LTH14 commented 3 months ago

@gzhuinjune 不能用这个,因为这个是在imagenet上训练的。可以用我上面说的Stable Diffusion用的VAE,他们是在openimage上训练的,通用性好很多,可以先在你的数据集上试试reconstruction效果。当然,如果performance不好的话,那还是得先在你自己的数据集上训练一个vae

gzhuinjune commented 3 months ago

您在上面引用了vqgan.py,请问这个是在哪里用呀,应该不是把AutoencoderKL放这里面吧,抱歉我还是没有明白用vqgan里面的哪个代码来训练 image 。您提到的sd的vae是这个对吗。

gzhuinjune commented 3 months ago

vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")是写在哪个文件里面呀,是main.py里面对吗,然后复制一个类到vqgan里面,直接用vqgan官网的这个定制数据集的来训练对吗 image

gzhuinjune commented 3 months ago

image 请问openimage的预训练权重是这个吗

gzhuinjune commented 3 months ago

是将AutoencoderKL类复制到了vqgan,为啥这里写得是from diffusers.models import AutoencoderKL,谢谢

LTH14 commented 3 months ago

from diffusers.models import AutoencoderKL让你可以直接使用stable diffusion训练好的vae。但如果你需要自己训练,那就需要把AutoencoderKL(https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/autoencoder.py#L285)复制到vqgan的codebase里进行训练

gzhuinjune commented 3 months ago

您好,请问我把AutoencoderKL类复制到vqgan里面替换哪个类呢,只替换进去应该没有被调用吧,我还是接着用main.py训练自己数据集的那个脚本示例对吗,然后如何在main.py里面调用它呢,谢谢您对初学者的耐心解答

LTH14 commented 3 months ago

复制到taming/models/vqgan.py里,然后用这个config。需要把这个config里面的ldm路径改成vqgan里的路径

gzhuinjune commented 3 months ago

谢谢大帅哥,祝你一切顺利

gzhuinjune commented 3 months ago

我明白了

fengyang0317 commented 1 month ago

Is there any reason for using taming-transformers instead of latent-diffusion or stable-diffusion codebase?

LTH14 commented 1 month ago

@fengyang0317 not really -- I just chose one.

fengyang0317 commented 1 month ago

I see. Thank you so much.