Open carlini opened 2 years ago
I have the same request. Thanks
The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.
why would you want a shitter model?
I have one specific usecase in mind where 256x256 is, in fact, not "shittier": diffusion models can make great denoisers to improve certified adversarial robustness, as long as the noise matches the image size (https://arxiv.org/abs/2206.10550). So the fact that 256x256 is closer to 224x224 makes this model much better.
I suspect it might also be useful for other purposes as well---and if you don't think this model would be useful for you, then just don't use it.
@carlini, I found this https://huggingface.co/justinpinkney/miniSD.
I found one at here: https://huggingface.co/lambdalabs/sd-image-variations-diffusers, just change the unet with this model and it will work
I would be also interested in the checkpoint of the model trained only on 256x256 data. It would be nice if you can provide it!
@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.
@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.
Hi I have the same query
The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.