CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.49k stars 10.18k forks source link

Checkpoint trained on only 256x256 data? #58

Open carlini opened 2 years ago

carlini commented 2 years ago

The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.

Yuheng-Li commented 2 years ago

I have the same request. Thanks

breadbrowser commented 2 years ago

The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.

why would you want a shitter model?

carlini commented 2 years ago

I have one specific usecase in mind where 256x256 is, in fact, not "shittier": diffusion models can make great denoisers to improve certified adversarial robustness, as long as the noise matches the image size (https://arxiv.org/abs/2206.10550). So the fact that 256x256 is closer to 224x224 makes this model much better.

I suspect it might also be useful for other purposes as well---and if you don't think this model would be useful for you, then just don't use it.

mikeogezi commented 1 year ago

@carlini, I found this https://huggingface.co/justinpinkney/miniSD.

pmzzs commented 1 year ago

I found one at here: https://huggingface.co/lambdalabs/sd-image-variations-diffusers, just change the unet with this model and it will work

ksai2324 commented 1 year ago

I would be also interested in the checkpoint of the model trained only on 256x256 data. It would be nice if you can provide it!

mikeogezi commented 1 year ago

https://huggingface.co/lambdalabs/miniSD-diffusers

wtliao commented 1 year ago

@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.

jing-yu-lim commented 9 months ago

@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.

Hi I have the same query