Open Ferry1231 opened 3 months ago
Thanks for your interest! We follow the VAE training from VQGAN and LDM. Please use this codebase and follow this config.
You need to copy the AutoencoderKL class to this file in the VQGAN codebase.
Thank you! I got it.
@gzhuinjune A major difference here is the vae in this paper does not rely on the "quantization" step in vqgan. Of course, this framework can also use vq-based tokenizer, but a non-vq tokenizer should work better. You can start with the commonly used non-vq tokenizer like the one below:
from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")
@gzhuinjune 不能用这个,因为这个是在imagenet上训练的。可以用我上面说的Stable Diffusion用的VAE,他们是在openimage上训练的,通用性好很多,可以先在你的数据集上试试reconstruction效果。当然,如果performance不好的话,那还是得先在你自己的数据集上训练一个vae
您在上面引用了,请问这个是在哪里用呀,应该不是把AutoencoderKL放这里面吧,抱歉我还是没有明白用vqgan里面的哪个代码来训练 。您提到的sd的vae是这个对吗。
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")是写在哪个文件里面呀,是main.py里面对吗,然后复制一个类到vqgan里面,直接用vqgan官网的这个定制数据集的来训练对吗
是将AutoencoderKL类复制到了vqgan,为啥这里写得是from diffusers.models import AutoencoderKL,谢谢
from diffusers.models import AutoencoderKL让你可以直接使用stable diffusion训练好的vae。但如果你需要自己训练,那就需要把AutoencoderKL(复制到vqgan的codebase里进行训练。
Is there any reason for using taming-transformers instead of latent-diffusion or stable-diffusion codebase?
@fengyang0317 not really -- I just chose one.
I see. Thank you so much.
Dear researcher,I have been reading your team's paper, and I found it incredibly insightful and was inspired to attempt a reproduction of the work. Considering the limited resources available in my lab and from a learning perspective, I plan to start by training the model on smaller datasets like CIFAR-10. However, I've encountered some difficulties while using the VAE encoder and couldn't find a VAE model that fits well with it.
Do you have the train code of VAE used in the paper? Another question, what does "vae_stride" param mean?
Thank you and thank for your works.