Open riariam opened 1 month ago
麻烦粘贴一下训练的pretrain和finetune脚本参数哈,以及你的损失曲线。
感谢您的回复 我的相关设置如下: 预训练脚本:
deepspeed --include localhost:0,1,2,3 --master_port 29501 tinyllava/train/train.py \
--deepspeed ./scripts/zero3.json \
--data_path $DATA_PATH\
--image_folder $IMAGE_PATH \
--is_multimodal True \
--conv_version pretrain \
--model_name_or_path $LLM_VERSION \
--vision_tower $VT_VERSION \
--vision_tower2 "$VT_VERSION2" \
--connector_type $CN_VERSION \
--mm_vision_select_layer -2 \
--image_aspect_ratio square \
--attn_implementation flash_attention_2 \
--fp16 True \
--training_recipe $TRAIN_RECIPE \
--tune_type_llm frozen \
--tune_type_vision_tower frozen \
--tune_vision_tower_from_layer 0 \
--tune_type_connector full \
--output_dir $OUTPUT_DIR\
--num_train_epochs 1 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 1 \
--save_total_limit 1 \
--learning_rate 1e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length $MODEL_MAX_LENGTH \
--gradient_checkpointing True \
--dataloader_num_workers 8 \
--lazy_preprocess True \
--report_to tensorboard \
--tokenizer_use_fast False \
--run_name $RUN_NAME
其中
LLM_VERSION=/localpath/HuggingFace/Qwen2.5-0.5B
VT_VERSION=/localpath/HuggingFace/google-siglip-so400m-patch14-384
CN_VERSION=mlp2x_gelu
微调阶段脚本:
deepspeed --include localhost:0,1,2,3 --master_port 29501 tinyllava/train/train.py \
--deepspeed ./scripts/zero3.json \
--data_path $DATA_PATH \
--image_folder $IMAGE_PATH \
--is_multimodal True \
--conv_version $CONV_VERSION \
--model_name_or_path $LLM_VERSION \
--vision_tower $VT_VERSION \
--vision_tower2 "$VT_VERSION2" \
--connector_type $CN_VERSION \
--mm_vision_select_layer -2 \
--image_aspect_ratio square \
--attn_implementation flash_attention_2 \
--fp16 True \
--training_recipe $TRAIN_RECIPE \
--tune_type_llm full \
--tune_type_vision_tower frozen\
--tune_vision_tower_from_layer 0 \
--tune_type_connector full \
--group_by_modality_length True \
--pretrained_model_path $PRETRAIN_DIR \
--output_dir $OUTPUT_DIR \
--num_train_epochs 1 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 40000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length $MODEL_MAX_LENGTH \
--gradient_checkpointing True \
--dataloader_num_workers 8 \
--lazy_preprocess True \
--report_to tensorboard \
--tokenizer_use_fast False \
--run_name $RUN_NAME
其中
CONV_VERSION=phi-2
QWEN2-0.5B训练过程中两阶段的损失如下图:
结果对比为:
模型 | 运行时长 | VQA-v2 | GQA | SQA-image | TextVQA | MM-Vet | POPE | MME | MMMU-val |
---|---|---|---|---|---|---|---|---|---|
原论文 | 72.3 | 55.8 | 60.1 | 45.2 | 19.5 | 86.6 | 1153 | 29.7 | |
本地复现 | 4张卡(单轮) Pretrain:1h40min Finetune:4h40min | 49.95 | 30.09 | 50.72 | 24.53 | 14.6 | 62.21 | 524.43 | 26.6 |
感谢您的回复 我的相关设置如下: 预训练脚本:
deepspeed --include localhost:0,1,2,3 --master_port 29501 tinyllava/train/train.py \ --deepspeed ./scripts/zero3.json \ --data_path $DATA_PATH\ --image_folder $IMAGE_PATH \ --is_multimodal True \ --conv_version pretrain \ --model_name_or_path $LLM_VERSION \ --vision_tower $VT_VERSION \ --vision_tower2 "$VT_VERSION2" \ --connector_type $CN_VERSION \ --mm_vision_select_layer -2 \ --image_aspect_ratio square \ --attn_implementation flash_attention_2 \ --fp16 True \ --training_recipe $TRAIN_RECIPE \ --tune_type_llm frozen \ --tune_type_vision_tower frozen \ --tune_vision_tower_from_layer 0 \ --tune_type_connector full \ --output_dir $OUTPUT_DIR\ --num_train_epochs 1 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "epoch" \ --save_steps 1 \ --save_total_limit 1 \ --learning_rate 1e-3 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length $MODEL_MAX_LENGTH \ --gradient_checkpointing True \ --dataloader_num_workers 8 \ --lazy_preprocess True \ --report_to tensorboard \ --tokenizer_use_fast False \ --run_name $RUN_NAME
其中
LLM_VERSION=/localpath/HuggingFace/Qwen2.5-0.5B
VT_VERSION=/localpath/HuggingFace/google-siglip-so400m-patch14-384
CN_VERSION=mlp2x_gelu
微调阶段脚本:
deepspeed --include localhost:0,1,2,3 --master_port 29501 tinyllava/train/train.py \ --deepspeed ./scripts/zero3.json \ --data_path $DATA_PATH \ --image_folder $IMAGE_PATH \ --is_multimodal True \ --conv_version $CONV_VERSION \ --model_name_or_path $LLM_VERSION \ --vision_tower $VT_VERSION \ --vision_tower2 "$VT_VERSION2" \ --connector_type $CN_VERSION \ --mm_vision_select_layer -2 \ --image_aspect_ratio square \ --attn_implementation flash_attention_2 \ --fp16 True \ --training_recipe $TRAIN_RECIPE \ --tune_type_llm full \ --tune_type_vision_tower frozen\ --tune_vision_tower_from_layer 0 \ --tune_type_connector full \ --group_by_modality_length True \ --pretrained_model_path $PRETRAIN_DIR \ --output_dir $OUTPUT_DIR \ --num_train_epochs 1 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "epoch" \ --save_steps 40000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length $MODEL_MAX_LENGTH \ --gradient_checkpointing True \ --dataloader_num_workers 8 \ --lazy_preprocess True \ --report_to tensorboard \ --tokenizer_use_fast False \ --run_name $RUN_NAME
其中
CONV_VERSION=phi-2
QWEN2-0.5B训练过程中两阶段的损失如下图:
结果对比为:
模型 运行时长 VQA-v2 GQA SQA-image TextVQA MM-Vet POPE MME MMMU-val 原论文 72.3 55.8 60.1 45.2 19.5 86.6 1153 29.7 本地复现 4张卡(单轮) Pretrain:1h40min Finetune:4h40min 49.95 30.09 50.72 24.53 14.6 62.21 524.43 26.6
训qwen2-0.5B-base/instruct我们是提供了官方训练脚本的,请参照scripts/train/qwen2/train_qwen2_base.sh。你自己的这个和我们提供的我看有几个参数不一致哈,尤其是CONV_VERSION。请用我们提供的脚本再训一次试试哈
我在本地复现qwen2-0.5B时,损失函数曲线看起来很正常,但是测试结果非常低。请问您有遇到过类似情况吗?如何能稳定训练qwen2呢