internvl2 video data train CUDA out of memory

我正在对internvl2使用视频数据进行full finetune，显卡为单张32G V100，报错torch.cuda.OutOfMemoryError: CUDA out of memory.

torchrun /cache/InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py \ --model_name_or_path /cache/MODELS/internvl2-4B \ --conv_style "phi3-chat" \ --output_dir /cache/InternVL/OUTPUTS/internvl_chat_v1_5_phi3_3_8b_dynamic_res_finetune_debug_load_2nd \ --meta_path /cache/InternVL/internvl_chat/shell/data/internvl_1_2_finetune_7k.json \ --overwrite_output_dir True \ --force_image_size 448 \ --max_dynamic_patch 1 \ --down_sample_ratio 0.5 \ --drop_path_rate 0.1 \ --freeze_llm False \ --freeze_mlp False \ --freeze_backbone True \ --vision_select_layer -1 \ --dataloader_num_workers 4 \ --bf16 False \ --fp16 True \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --save_total_limit 1 \ --learning_rate 4e-5 \ --weight_decay 0.05 \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --max_seq_length 4096 \ --do_train True \ --grad_checkpoint True \ --group_by_length True \ --dynamic_image_size True \ --use_thumbnail True \ --ps_version 'v2' \ --deepspeed /cache/ZYM/InternVL/internvl_chat/zero_stage2_config.json \ --report_to "tensorboard" \ 2>&1 | tee -a /cache/ZYM/InternVL/OUTPUTS/internvl_chat_v1_5_phi3_3_8b_dynamic_res_finetune_debug/training_log.txt

class LazySupervisedDataset(Dataset): min_num_frame=1, # for video data max_num_frame=1, # for video data

使用decord库加载视频，frames = read_frames_decord(fn, num_frames=max_num_frames, min_num_frames=min_num_frames, sample=sample, clip=clip)

batch_size设置为1, 虽然似乎视频数据不使用动态高分辨率，但我仍然将max_dynamic_patch设置为1 video_dataloader中将加载帧数固定为1

以上setting仍然显示显存占用溢出...

OpenGVLab / InternVL

internvl2 video data train CUDA out of memory #547