OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
https://internvl.github.io/
MIT License
4.19k stars 316 forks source link

InternVL1.5 finetune #251

Open cyj95 opened 1 month ago

cyj95 commented 1 month ago

InternVL/internvl_chat/shell/internlm2_20b_dynamic /internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune.sh

有没有不用srun跑的版本?不是root用户无法安装srun相关的slurm-client

cyj95 commented 1 month ago

or is there any guide on how to install and use"srun"? which version?

njzfw1024 commented 3 weeks ago

请问目前有合适的解决方法了嘛

tdye24 commented 3 weeks ago

请问目前有合适的解决方法了嘛

同问

tdye24 commented 3 weeks ago

Is it right?

set -x

GPUS=${GPUS:-2}
BATCH_SIZE=${BATCH_SIZE:-16}
PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-4}
GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))

export PYTHONPATH="${PYTHONPATH}:$(pwd)"
export MASTER_PORT=34229
export TF_CPP_MIN_LOG_LEVEL=3
export LAUNCHER=pytorch

OUTPUT_DIR='work_dirs/ours_new'

if [ ! -d "$OUTPUT_DIR" ]; then
 mkdir -p "$OUTPUT_DIR"
fi

# number of gpus: 2
# batch size per gpu: 4
# gradient accumulation steps: 2
# total batch size: 16
# epoch: 1
torchrun \
 --nnodes=1 \
 --node_rank=0 \
 --master_addr=127.0.0.1 \
 --nproc_per_node=${GPUS} \
 --master_port=${MASTER_PORT} \
 ${SRUN_ARGS} \
 internvl/train/internvl_chat_finetune.py \
 --model_name_or_path "pretrained/Mini-InternVL-Chat-4B-V1-5" \
 --conv_style "phi3-chat" \
 --output_dir ${OUTPUT_DIR} \
 --meta_path "shell/data/ours.json" \
 --overwrite_output_dir True \
 --force_image_size 448 \
 --max_dynamic_patch 12 \
 --down_sample_ratio 0.5 \
 --drop_path_rate 0.1 \
 --pad2square False \
 --freeze_llm False \
 --freeze_mlp False \
 --freeze_backbone False \
 --vision_select_layer -1 \
 --use_data_resampling False \
 --dataloader_num_workers 4 \
 --bf16 True \
 --num_train_epochs 1 \
 --per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE} \
 --gradient_accumulation_steps ${GRADIENT_ACC} \
 --evaluation_strategy "no" \
 --save_strategy "steps" \
 --save_steps 200 \
 --save_total_limit 3 \
 --learning_rate 4e-5 \
 --weight_decay 0.05 \
 --warmup_ratio 0.03 \
 --lr_scheduler_type "cosine" \
 --logging_steps 1 \
 --max_seq_length 8192 \
 --do_train True \
 --grad_checkpoint True \
 --group_by_length True \
 --dynamic_image_size True \
 --use_thumbnail True \
 --deepspeed "zero_stage1_config.json" \
 --report_to "tensorboard" \
 2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"
HaoRenkk123 commented 2 weeks ago

请问这个finetune脚本只适用于Mini版吗,适用于InternVL-Chat-V1-5版不

njzfw1024 commented 2 weeks ago

请问这个finetune脚本只适用于Mini版吗,适用于InternVL-Chat-V1-5版不

8 80G 显存不够,chat1-5得最起码得16 80g吧

HaoRenkk123 commented 6 days ago

微调Mini-InternVL-Chat-4B-V1-5模型出现大量warning: tokenization mismatch:

企业微信截图_17206082127421
HaoRenkk123 commented 6 days ago

请问conv_style要设置为phi3-chat吗,我微调的模型为Mini-InternVL-Chat-4B-V1-5,但是设置的conv_style是internlm2-chat,不清楚是不是这个原因导致大量的mismatch警告