damo-cv / RealisDance

The official implementation of RealisDance
Apache License 2.0
202 stars 11 forks source link

training gpu requirements #14

Open DuanWei1234 opened 3 weeks ago

DuanWei1234 commented 3 weeks ago

Hello,thanks for your code, I want to know how much GPU memory is needed for training.

theFoxofSky commented 1 week ago

About 30-60G depending on the batch size and resolution

SzhangS commented 2 days ago

Hello I follow the default config setting, however the train gpu pass 80G, Could you give me some advice?

SzhangS commented 2 days ago

@theFoxofSky

theFoxofSky commented 2 days ago

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

theFoxofSky commented 2 days ago

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

Moreover, I remember the training is not larger than 80G, please check the resolution and batchsize of your data

SzhangS commented 2 days ago

image_finetune: False

output_dir: "outputs" pretrained_model_path: "pretrained_models/RV/" pretrained_clip_path: "pretrained_models/DINO/dinov2/" pretrained_mm_path: "pretrained_models/MM/mm_sd_v15_v2.ckpt"

unet_additional_kwargs: use_motion_module : True motion_module_resolutions : [ 1,2,4,8 ] unet_use_cross_frame_attention : False unet_use_temporal_attention : False

motion_module_type: Vanilla motion_module_kwargs: num_attention_heads : 8 num_transformer_block : 1 attention_block_types : [ "Temporal_Self", "Temporal_Self" ] temporal_position_encoding : True temporal_position_encoding_max_len : 32 temporal_attention_dim_div : 1 zero_initialize : True

pose_guider_kwargs: pose_guider_type: "side_guider" args: out_channels: [ 320, 320, 640, 1280, 1280 ]

clip_projector_kwargs: projector_type: "ff" in_features: 1024 out_features: 768

zero_snr: True v_pred: True train_cfg: False snr_gamma: 5.0 fix_ref_t: True pose_shuffle_ratio: 0.05

vae_slicing: True fps: 8 #30

validation_kwargs: guidance_scale: 2

train_data:

validation_data: dataset_class: VideoDataset args: root_dir: "./video_dance_data" split: "val" sample_size: [ 768, 576 ] clip_size: [ 320, 240 ] image_finetune: False ref_mode: "first" sample_n_frames: 12 start_pixel: 0 fix_gap: True

trainable_modules:

unet_checkpoint_path: "outputs/stage1_hamer/checkpoints/checkpoint-final.ckpt"

unet_checkpoint_path: "pretrained_models/checkpoint/stage_2_hamer_release.ckpt"

lr_scheduler: "constant_with_warmup" learning_rate: 1e-5 lr_warmup_steps: 5000 train_batch_size: 1 validation_batch_size: 1

max_train_epoch: -1 max_train_steps: 1000 checkpointing_epochs: -1 checkpointing_steps: 500 checkpointing_steps_tuple: [ 2, 500 ]

global_seed: 42 mixed_precision: "fp16"

is_debug: False

SzhangS commented 2 days ago

This is my stage2_hamer.yaml, GPU out of memory on A100 . I have to change sample_n_frames from 16 to 12. Is it feasible?

theFoxofSky commented 2 days ago

If so, please use gradient checkpointing.

Call this function before training.

image

SzhangS commented 2 days ago

Thanks