explainingai-code / StableDiffusion-PyTorch

This repo implements a Stable Diffusion model in PyTorch with all the essential components.
77 stars 15 forks source link

How to modify config files to generate higher resolution images #13

Open vavasthi opened 2 months ago

vavasthi commented 2 months ago

I am working on a use case where I want to generate larger resolution images something like 1024x1024. How do I modify the configuration to do that?

explainingai-code commented 2 months ago

Hello @vavasthi,

For generating larger resolution images, you would need significant compute at your disposal. But assuming you have that, the actual change is only in one key which is the image size dataset_params: im_size : 256->1024

However as of now the autoencoder has a downscale factor of 8, which means you would be training ldm for 128x128 images. If thats fine for you in terms of compute cost then great but if not then you would want to increase that factor to 16 so that your ldm training happens on 64x64. For this you would need below changes:

autoencoder_params: down_channels : [64, 128, 256, 256] changed to -> [64, 128, 256, 256, 256] down_sample : [True, True, True] changed to -> [True, True, True, True] attn_down : [False, False, False] changed to ->[False, False, False, False]

vavasthi commented 2 months ago

Thanks @explainingai-code. With the changes you suggested I was able to succesfully train model for 768 by 768 pixel images. It still doesn't work for 1024px images. I currently have a RTX 4090 with 24 GB of RAM so I am limited by that size. Just another question, given the fact that my images are all greyscale, is there any other change that I could do to config that would help me reach 1024px images.

I have already set im_channels to 1.

Is there any other setting that could reduce memory requirements of the model?

explainingai-code commented 2 months ago

If you haven't changed batch size then can you try reducing the batch sizes for both the auto encoder and ldm using the following:

train_params:
    ldm_batch_size: 16 changed to -> 4
    autoencoder_batch_size: 4 changed to -> 1
    autoencoder_acc_steps: 4 changed to -> 16

I would assume the batch size reduction would only be needed for autoencoder stage so maybe just change that and see. And if autoencoder trains successfully but ldm fails then reduce the ldm batch size as well.

jpmcarvalho commented 1 month ago

What is the goal of using autoencoder_acc_steps different than 1? If it's higher than 1 it will create X gradients for all weights and will consume a lot of memory right?

explainingai-code commented 1 month ago

Hello @jpmcarvalho , autoencoder_acc_step is just for gradient accumulation, mimicking training with larger batch size even if your GPU memory is not enough to accommodate the larger batch size. For autoencoder, we would have larger image sizes, so hence added this support in the config.

Nikita-Sherstnev commented 3 weeks ago

Hello, @explainingai-code, how can I train VAE on smaller images like 64 and 128? I tried to change just im_size, but VAE generates very noisy images after training. Also perceptual loss is becoming negative. Maybe I can avoid using VAE altogether.

explainingai-code commented 2 weeks ago

Hello @Nikita-Sherstnev , VAE should work on smaller sizes also. Here's the config which I used for mnist dataset (https://github.com/explainingai-code/StableDiffusion-PyTorch/blob/main/config/mnist.yaml) and the only changes in config were the channels, im_size and reducing the downscaling factor to only two . And yes if your images are just 64x64 then you can instead just use diffusion on images itself using the other repo (https://github.com/explainingai-code/DDPM-Pytorch) rather than diffusion on latents.

However since perceptual loss is becoming negative,I think there maybe some other issue as that should not be the case, because lpips loss is just scaled mean square differences between feature maps of two images, and the scaling factors are all positive, so it should never be negative at all. Is it possible that you missed loading the lpips model weight(https://github.com/explainingai-code/StableDiffusion-PyTorch/tree/main#setup) causing the scaling factors to be negative and hence the negative perceptual loss and bad VAE output ? Could you please check if the lpips weight is getting loaded correctly here

Nikita-Sherstnev commented 1 week ago

@explainingai-code Thank you for your answer! Sure thing I've downloaded wrong lpips weights, model that was downloaded from the code did not match model from the readme for some reason. Anyway, model does not seem to train very well. My dataset is very small - 64 images, maybe this is the issue. I trained for about 120 epochs with batch size 8 and discriminator turned on in the last 40 epochs. Looks like discriminator does not give any quality improvements. I would like to train DDPM model itself, but I want it to be text-conditioned as well :)