OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
5.2k stars 406 forks source link

finetune时候运行torchrun报这个错 #287

Closed gangxu822 closed 3 weeks ago

gangxu822 commented 2 months ago
gangxu822 commented 2 months ago

脚本:OUTPUT_DIR='/opt/cv/InternVL/internvl_chat/shell/internlm2_20b_dynamic/entity_extract_exps/'

rm -r /opt/cv/grounding_exps/internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune_0618/*

if [ ! -d "$OUTPUT_DIR" ]; then mkdir -p "$OUTPUT_DIR" fi

bash shell/internlm2_20b_dynamic/internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune.sh

torchrun $DISTRIBUTED_ARGS /opt/cv/InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py \ --model_name_or_path "/mnt/data0/models--OpenGVLab--InternVL-Chat-V1-5/snapshots/InternVL-Chat-V1-5/" \ --conv_style "internlm2-chat" \ --output_dir ${OUTPUT_DIR} \ --meta_path "/mnt/data0/event_entity_extraction_internvl1-5-demo.jsonl" \ --overwrite_output_dir True \ --force_image_size 448 \ --max_dynamic_patch 6 \ --down_sample_ratio 0.5 \ --drop_path_rate 0.4 \ --pad2square False \ --freeze_llm False \ --freeze_mlp False \ --freeze_backbone True \ --vision_select_layer -1 \ --use_data_resampling False \ --dataloader_num_workers 4 \ --bf16 True \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 2000 \ --save_total_limit 20 \ --learning_rate 2e-5 \ --weight_decay 0.05 \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --max_seq_length 4096 \ --do_train True \ --grad_checkpoint True \ --group_by_length True \ --dynamic_image_size True \ --use_thumbnail True \ --ps_version 'v2' \ --deepspeed "/opt/cv/InternVL/internvl_chat/zero_stage3_config.json" \ --report_to "tensorboard" \ 2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"

1SingleFeng commented 1 month ago

请问你解决了该问题吗,能提供一下相关脚本吗

gangxu822 commented 1 month ago

请问你解决了该问题吗,能提供一下相关脚本吗 NODE_RANK设为0,好了