batch size issue when trianing custom dataset

I'm trying to train the model with a custom dataset on 4 a6000(49GB each) gpus but it takes 27GB each when training the model with batchsize 1 here is my config file and gpu status `model: base_learning_rate: 1.0e-04 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: "image_target" cond_stage_key: "image_cond" image_size: 32 channels: 4 cond_stage_trainable: false # Note: different from the one we trained before conditioning_key: hybrid monitor: val/loss_simple_ema scale_factor: 0.18215

scheduler_config: # 10000 warmup steps
  target: ldm.lr_scheduler.LambdaLinearScheduler
  params:
    warm_up_steps: [ 100 ]
    cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
    f_start: [ 1.e-6 ]
    f_max: [ 1. ]
    f_min: [ 1. ]
unet_config:
  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
  params:
    image_size: 32 # unused
    in_channels: 8
    out_channels: 4
    model_channels: 320
    attention_resolutions: [ 4, 2, 1 ]
    num_res_blocks: 2
    channel_mult: [ 1, 2, 4, 4 ]
    num_heads: 8
    use_spatial_transformer: True
    transformer_depth: 1
    context_dim: 768
    use_checkpoint: True
    legacy: False
first_stage_config:
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    embed_dim: 4
    monitor: val/rec_loss
    ddconfig:
      double_z: true
      z_channels: 4
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult:
      - 1
      - 2
      - 4
      - 4
      num_res_blocks: 2
      attn_resolutions: []
      dropout: 0.0
    lossconfig:
      target: torch.nn.Identity
cond_stage_config:
  target: ldm.modules.encoders.modules.FrozenCLIPImageEmbedder

data: target: ldm.data.simple.ObjaverseDataModuleFromConfig params: root_dir: my_path batch_size: 1 num_workers: 8 total_view: 4 train: validation: False image_transforms: size: 256 validation: validation: True image_transforms: size: 256 lightning: find_unused_parameters: false metrics_over_trainsteps_checkpoint: True modelcheckpoint: params: every_n_train_steps: 5000 callbacks: image_logger: target: main.ImageLogger params: batch_frequency: 500 max_images: 32 increase_log_steps: False log_first_step: True log_images_kwargs: use_ema_scope: False inpaint: False plot_progressive_rows: False plot_diffusion_rows: False N: 32 unconditional_guidance_scale: 3.0 unconditional_guidance_label: [""] trainer: benchmark: True val_check_interval: 5000000 # really sorry num_sanity_val_steps: 0 accumulate_grad_batches: 5 Wed Apr 24 06:47:00 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA RTX A6000 Off | 00000000:1D:00.0 Off | Off | | 48% 71C P2 203W / 300W | 27238MiB / 49140MiB | 92% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A6000 Off | 00000000:1E:00.0 Off | Off | | 46% 70C P2 204W / 300W | 27242MiB / 49140MiB | 93% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA RTX A6000 Off | 00000000:1F:00.0 Off | Off | | 49% 73C P2 202W / 300W | 27242MiB / 49140MiB | 94% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA RTX A6000 Off | 00000000:20:00.0 Off | Off | | 47% 70C P2 194W / 300W | 27222MiB / 49140MiB | 94% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+` Is it normal for batch size 1 to consume this much GPU?

cvlab-columbia / zero123

batch size issue when trianing custom dataset #128