Open cyj95 opened 1 month ago
or is there any guide on how to install and use"srun"? which version?
请问目前有合适的解决方法了嘛
请问目前有合适的解决方法了嘛
同问
Is it right?
set -x
GPUS=${GPUS:-2}
BATCH_SIZE=${BATCH_SIZE:-16}
PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-4}
GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
export MASTER_PORT=34229
export TF_CPP_MIN_LOG_LEVEL=3
export LAUNCHER=pytorch
OUTPUT_DIR='work_dirs/ours_new'
if [ ! -d "$OUTPUT_DIR" ]; then
mkdir -p "$OUTPUT_DIR"
fi
# number of gpus: 2
# batch size per gpu: 4
# gradient accumulation steps: 2
# total batch size: 16
# epoch: 1
torchrun \
--nnodes=1 \
--node_rank=0 \
--master_addr=127.0.0.1 \
--nproc_per_node=${GPUS} \
--master_port=${MASTER_PORT} \
${SRUN_ARGS} \
internvl/train/internvl_chat_finetune.py \
--model_name_or_path "pretrained/Mini-InternVL-Chat-4B-V1-5" \
--conv_style "phi3-chat" \
--output_dir ${OUTPUT_DIR} \
--meta_path "shell/data/ours.json" \
--overwrite_output_dir True \
--force_image_size 448 \
--max_dynamic_patch 12 \
--down_sample_ratio 0.5 \
--drop_path_rate 0.1 \
--pad2square False \
--freeze_llm False \
--freeze_mlp False \
--freeze_backbone False \
--vision_select_layer -1 \
--use_data_resampling False \
--dataloader_num_workers 4 \
--bf16 True \
--num_train_epochs 1 \
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE} \
--gradient_accumulation_steps ${GRADIENT_ACC} \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 200 \
--save_total_limit 3 \
--learning_rate 4e-5 \
--weight_decay 0.05 \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--max_seq_length 8192 \
--do_train True \
--grad_checkpoint True \
--group_by_length True \
--dynamic_image_size True \
--use_thumbnail True \
--deepspeed "zero_stage1_config.json" \
--report_to "tensorboard" \
2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"
请问这个finetune脚本只适用于Mini版吗,适用于InternVL-Chat-V1-5版不
请问这个finetune脚本只适用于Mini版吗,适用于InternVL-Chat-V1-5版不
8 80G 显存不够,chat1-5得最起码得16 80g吧
微调Mini-InternVL-Chat-4B-V1-5模型出现大量warning: tokenization mismatch:
请问conv_style要设置为phi3-chat吗,我微调的模型为Mini-InternVL-Chat-4B-V1-5,但是设置的conv_style是internlm2-chat,不清楚是不是这个原因导致大量的mismatch警告
InternVL/internvl_chat/shell/internlm2_20b_dynamic /internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune.sh
有没有不用srun跑的版本?不是root用户无法安装srun相关的slurm-client