explainingai-code / StableDiffusion-PyTorch

This repo implements a Stable Diffusion model in PyTorch with all the essential components.
77 stars 15 forks source link

Running out of memory... best number of samples for custom data sets? #8

Open pgtinsley opened 3 months ago

pgtinsley commented 3 months ago

Hello!

I was wondering if you have any intuition on how many training samples are required to get good results/how much memory is required to train the unconditional VQVAE?

I have about 200k grayscale images at 256x256... which was obviously too much, so I scaled back to 70 images just to see if it would start training, but it didn't... throwing the too little memory error.

Is this something batch size can fix or do I need to mess with a bunch of other parameters? I only changed the im_channels and save_latent parameters from their defaults.

Thank you!

explainingai-code commented 3 months ago

Hello @pgtinsley , The out of memory error would not have anything to do with the number of images in your dataset. You can keep it at 200K but if its taking too much to train and you want to speed that up, you can try training on 50K images(but it depends on how much variation is their between the images). Could you tell me what GPU are you using? The parameters that you would essentially play with to get rid of this error would be batch size, down/mid channels, number of down/mid/up layers .

pgtinsley commented 3 months ago

Hi @explainingai-code , I'm using some older hardware -- 4 GTX Titan X's. Could that also be a problem? Thank you!

pgtinsley commented 3 months ago

Here is the config file:

dataset_params:
  im_path: 'data/combined_cropped_256x256'
  im_channels : 1
  im_size : 256
  name: 'combined_cropped_256x256'

diffusion_params:
  num_timesteps : 1000
  beta_start : 0.0015
  beta_end : 0.0195

ldm_params:
  down_channels: [ 256, 384, 512, 768 ]
  mid_channels: [ 768, 512 ]
  down_sample: [ True, True, True ]
  attn_down : [True, True, True]
  time_emb_dim: 512
  norm_channels: 32
  num_heads: 16
  conv_out_channels : 128
  num_down_layers : 2
  num_mid_layers : 2
  num_up_layers : 2

autoencoder_params:
  z_channels: 3
  codebook_size : 8192
  down_channels : [64, 128, 256, 256]
  mid_channels : [256, 256]
  down_sample : [True, True, True]
  attn_down : [False, False, False]
  norm_channels: 32
  num_heads: 4
  num_down_layers : 2
  num_mid_layers : 2
  num_up_layers : 2

train_params:
  seed : 1111
  task_name: 'combined_cropped_256x256'
  ldm_batch_size: 16
  autoencoder_batch_size: 4
  disc_start: 15000
  disc_weight: 0.5
  codebook_weight: 1
  commitment_beta: 0.2
  perceptual_weight: 1
  kl_weight: 0.000005
  ldm_epochs: 100
  autoencoder_epochs: 20
  num_samples: 1
  num_grid_rows: 1
  ldm_lr: 0.000005
  autoencoder_lr: 0.00001
  autoencoder_acc_steps: 4
  autoencoder_img_save_steps: 64
  save_latents : True
  vae_latent_dir_name: 'vae_latents'
  vqvae_latent_dir_name: 'vqvae_latents'
  ldm_ckpt_name: 'ddpm_ckpt.pth'
  vqvae_autoencoder_ckpt_name: 'vqvae_autoencoder_ckpt.pth'
  vae_autoencoder_ckpt_name: 'vae_autoencoder_ckpt.pth'
  vqvae_discriminator_ckpt_name: 'vqvae_discriminator_ckpt.pth'
  vae_discriminator_ckpt_name: 'vae_discriminator_ckpt.pth'
explainingai-code commented 3 months ago

Yeah. But thats fine, lets try to reduce the memory required without reducing the network parameters first. Can you try changing these two parameters: autoencoder_batch_size:2 autoencoder_acc_steps:8

And see if its runs. If not then also try with autoencoder_batch_size:1 autoencoder_acc_steps:16

Just to add when I was training on celebhq with 256x256 rgb images, I was using Nvidia V100

pgtinsley commented 3 months ago

Still no luck with either of those options... I'll try to get some GPUs with more memory... thank you!

explainingai-code commented 3 months ago

@pgtinsley, Yes that should solve this problem. Couple of other things that you can try incase you are unable to get a higher memory gpu: num_down_layers : 1 (instead of 2 in autoencoder_params) First maybe just try this with (autoencoder_batch_size:1 , autoencoder_acc_steps:16)

Then lastly modify the downsample parameter and have autoencoder work with 128x128 images. down_sample : [True, True, False] (instead of [True, True, True] in autoencoder_params) im_size : 128(instead of 256 in dataset_params) This will build an auto encoder that takes 128x128 images to 32x32 latent images.