Open DuanWei1234 opened 3 weeks ago
About 30-60G depending on the batch size and resolution
Hello I follow the default config setting, however the train gpu pass 80G, Could you give me some advice?
@theFoxofSky
As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.
As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.
Moreover, I remember the training is not larger than 80G, please check the resolution and batchsize of your data
image_finetune: False
output_dir: "outputs" pretrained_model_path: "pretrained_models/RV/" pretrained_clip_path: "pretrained_models/DINO/dinov2/" pretrained_mm_path: "pretrained_models/MM/mm_sd_v15_v2.ckpt"
unet_additional_kwargs: use_motion_module : True motion_module_resolutions : [ 1,2,4,8 ] unet_use_cross_frame_attention : False unet_use_temporal_attention : False
motion_module_type: Vanilla motion_module_kwargs: num_attention_heads : 8 num_transformer_block : 1 attention_block_types : [ "Temporal_Self", "Temporal_Self" ] temporal_position_encoding : True temporal_position_encoding_max_len : 32 temporal_attention_dim_div : 1 zero_initialize : True
pose_guider_kwargs: pose_guider_type: "side_guider" args: out_channels: [ 320, 320, 640, 1280, 1280 ]
clip_projector_kwargs: projector_type: "ff" in_features: 1024 out_features: 768
zero_snr: True v_pred: True train_cfg: False snr_gamma: 5.0 fix_ref_t: True pose_shuffle_ratio: 0.05
vae_slicing: True fps: 8 #30
validation_kwargs: guidance_scale: 2
train_data:
validation_data: dataset_class: VideoDataset args: root_dir: "./video_dance_data" split: "val" sample_size: [ 768, 576 ] clip_size: [ 320, 240 ] image_finetune: False ref_mode: "first" sample_n_frames: 12 start_pixel: 0 fix_gap: True
trainable_modules:
unet_checkpoint_path: "pretrained_models/checkpoint/stage_2_hamer_release.ckpt"
lr_scheduler: "constant_with_warmup" learning_rate: 1e-5 lr_warmup_steps: 5000 train_batch_size: 1 validation_batch_size: 1
max_train_epoch: -1 max_train_steps: 1000 checkpointing_epochs: -1 checkpointing_steps: 500 checkpointing_steps_tuple: [ 2, 500 ]
global_seed: 42 mixed_precision: "fp16"
is_debug: False
This is my stage2_hamer.yaml, GPU out of memory on A100 . I have to change sample_n_frames from 16 to 12. Is it feasible?
If so, please use gradient checkpointing.
Call this function before training.
Thanks
Hello,thanks for your code, I want to know how much GPU memory is needed for training.