Open Ferry1231 opened 3 months ago
Thanks for your interest! We follow the VAE training from VQGAN and LDM. Please use this codebase and follow this config.
You need to copy the AutoencoderKL class to this file in the VQGAN codebase.
Thank you! I got it.
您好,请问我当时在跑rcg的时候训练过的vqgan也可以直接在这里用对吗,vae_ckpt就是vqgan对吗,请问还改了别的细节吗,我依旧想换平面图的数据集,谢谢您的耐心回答,希望您能把改动的细节都告诉我,我害怕对不上,之前的rgb的范围以及数据增强之类的应该如何设置呢,比起官方代码还有别的改动吗,谢谢!!!
很遗憾之前的rcg我一直没有跑出来理想的结果,这个看起来组成部分要少很多,谢谢您的帮助
@gzhuinjune A major difference here is the vae in this paper does not rely on the "quantization" step in vqgan. Of course, this framework can also use vq-based tokenizer, but a non-vq tokenizer should work better. You can start with the commonly used non-vq tokenizer like the one below:
from diffusers.models import AutoencoderKL
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")
@gzhuinjune 不能用这个,因为这个是在imagenet上训练的。可以用我上面说的Stable Diffusion用的VAE,他们是在openimage上训练的,通用性好很多,可以先在你的数据集上试试reconstruction效果。当然,如果performance不好的话,那还是得先在你自己的数据集上训练一个vae
您在上面引用了vqgan.py,请问这个是在哪里用呀,应该不是把AutoencoderKL放这里面吧,抱歉我还是没有明白用vqgan里面的哪个代码来训练 。您提到的sd的vae是这个对吗。
vae = AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema")是写在哪个文件里面呀,是main.py里面对吗,然后复制一个类到vqgan里面,直接用vqgan官网的这个定制数据集的来训练对吗
请问openimage的预训练权重是这个吗
是将AutoencoderKL类复制到了vqgan,为啥这里写得是from diffusers.models import AutoencoderKL,谢谢
from diffusers.models import AutoencoderKL让你可以直接使用stable diffusion训练好的vae。但如果你需要自己训练,那就需要把AutoencoderKL(https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/autoencoder.py#L285)复制到vqgan的codebase里进行训练。
您好,请问我把AutoencoderKL类复制到vqgan里面替换哪个类呢,只替换进去应该没有被调用吧,我还是接着用main.py训练自己数据集的那个脚本示例对吗,然后如何在main.py里面调用它呢,谢谢您对初学者的耐心解答
谢谢大帅哥,祝你一切顺利
我明白了
Is there any reason for using taming-transformers instead of latent-diffusion or stable-diffusion codebase?
@fengyang0317 not really -- I just chose one.
I see. Thank you so much.
Dear researcher,I have been reading your team's paper, and I found it incredibly insightful and was inspired to attempt a reproduction of the work. Considering the limited resources available in my lab and from a learning perspective, I plan to start by training the model on smaller datasets like CIFAR-10. However, I've encountered some difficulties while using the VAE encoder and couldn't find a VAE model that fits well with it.
Do you have the train code of VAE used in the paper? Another question, what does "vae_stride" param mean?
Thank you and thank for your works.