Open pgtinsley opened 3 months ago
Hello @pgtinsley , The out of memory error would not have anything to do with the number of images in your dataset. You can keep it at 200K but if its taking too much to train and you want to speed that up, you can try training on 50K images(but it depends on how much variation is their between the images). Could you tell me what GPU are you using? The parameters that you would essentially play with to get rid of this error would be batch size, down/mid channels, number of down/mid/up layers .
Hi @explainingai-code , I'm using some older hardware -- 4 GTX Titan X's. Could that also be a problem? Thank you!
Here is the config file:
dataset_params:
im_path: 'data/combined_cropped_256x256'
im_channels : 1
im_size : 256
name: 'combined_cropped_256x256'
diffusion_params:
num_timesteps : 1000
beta_start : 0.0015
beta_end : 0.0195
ldm_params:
down_channels: [ 256, 384, 512, 768 ]
mid_channels: [ 768, 512 ]
down_sample: [ True, True, True ]
attn_down : [True, True, True]
time_emb_dim: 512
norm_channels: 32
num_heads: 16
conv_out_channels : 128
num_down_layers : 2
num_mid_layers : 2
num_up_layers : 2
autoencoder_params:
z_channels: 3
codebook_size : 8192
down_channels : [64, 128, 256, 256]
mid_channels : [256, 256]
down_sample : [True, True, True]
attn_down : [False, False, False]
norm_channels: 32
num_heads: 4
num_down_layers : 2
num_mid_layers : 2
num_up_layers : 2
train_params:
seed : 1111
task_name: 'combined_cropped_256x256'
ldm_batch_size: 16
autoencoder_batch_size: 4
disc_start: 15000
disc_weight: 0.5
codebook_weight: 1
commitment_beta: 0.2
perceptual_weight: 1
kl_weight: 0.000005
ldm_epochs: 100
autoencoder_epochs: 20
num_samples: 1
num_grid_rows: 1
ldm_lr: 0.000005
autoencoder_lr: 0.00001
autoencoder_acc_steps: 4
autoencoder_img_save_steps: 64
save_latents : True
vae_latent_dir_name: 'vae_latents'
vqvae_latent_dir_name: 'vqvae_latents'
ldm_ckpt_name: 'ddpm_ckpt.pth'
vqvae_autoencoder_ckpt_name: 'vqvae_autoencoder_ckpt.pth'
vae_autoencoder_ckpt_name: 'vae_autoencoder_ckpt.pth'
vqvae_discriminator_ckpt_name: 'vqvae_discriminator_ckpt.pth'
vae_discriminator_ckpt_name: 'vae_discriminator_ckpt.pth'
Yeah. But thats fine, lets try to reduce the memory required without reducing the network parameters first. Can you try changing these two parameters: autoencoder_batch_size:2 autoencoder_acc_steps:8
And see if its runs. If not then also try with autoencoder_batch_size:1 autoencoder_acc_steps:16
Just to add when I was training on celebhq with 256x256 rgb images, I was using Nvidia V100
Still no luck with either of those options... I'll try to get some GPUs with more memory... thank you!
@pgtinsley, Yes that should solve this problem. Couple of other things that you can try incase you are unable to get a higher memory gpu: num_down_layers : 1 (instead of 2 in autoencoder_params) First maybe just try this with (autoencoder_batch_size:1 , autoencoder_acc_steps:16)
Then lastly modify the downsample parameter and have autoencoder work with 128x128 images. down_sample : [True, True, False] (instead of [True, True, True] in autoencoder_params) im_size : 128(instead of 256 in dataset_params) This will build an auto encoder that takes 128x128 images to 32x32 latent images.
Hello!
I was wondering if you have any intuition on how many training samples are required to get good results/how much memory is required to train the unconditional VQVAE?
I have about 200k grayscale images at 256x256... which was obviously too much, so I scaled back to 70 images just to see if it would start training, but it didn't... throwing the too little memory error.
Is this something batch size can fix or do I need to mess with a bunch of other parameters? I only changed the im_channels and save_latent parameters from their defaults.
Thank you!