Open xienan0326 opened 6 months ago
This issue may be caused by this line, you may check your data format to avoid.
This issue may be caused by this line, you may check your data format to avoid.
训练数据的格式,如何修改
{
"id": "0",
"image": [
"/workspace/VL-Data/images/wKjBzVc_ukiAa5D4AATqsCZTbKg837.jpg"
],
"conversations": [
{
"from": "user",
"value": "
This issue may be caused by this line, you may check your data format to avoid.
Doesn’t it support multiple rounds of dialogue with one picture?
Can you comment this line and these two lines, and re-try to see if this issue still exists?
Can you comment this line and these two lines, and re-try to see if this issue still exists? still error
我把bitch_size改成1 训练数据改成: { "id": "0", "image": ["/workspace/VL-Data/images/wKjBzVc_ukiAa5D4AATqsCZTbKg837.jpg"], "conversations": [ { "from": "user", "value": " 图中是什么?" }, { "from": "assistant", "value": "图中是NBA球星勒布朗.詹姆斯。" } ] }
print text:
['[UNUSED_TOKEN_146]user\n
Can you comment this line and these two lines, and re-try to see if this issue still exists? still error
我把bitch_size改成1 训练数据改成: { "id": "0", "image": ["/workspace/VL-Data/images/wKjBzVc_ukiAa5D4AATqsCZTbKg837.jpg"], "conversations": [ { "from": "user", "value": " 图中是什么?" }, { "from": "assistant", "value": "图中是NBA球星勒布朗.詹姆斯。" } ] }
print text: ['[UNUSED_TOKEN_146]user\n 图中是什么?[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n图中是NBA球星勒布朗.詹姆斯。[UNUSED_TOKEN_145]\n'] print(len(batch['image'])) 1 print(len(batch['text_input'])) 1 print(batch['data_type']) ['multi']
value 中不需要加
I encountered a similar issue before, which was resolved by using a larger max_length
.
训练脚本
!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1 DIR=
pwd
export MODEL="/workspace/model_weight/internlm-xcomposer2-vl-7b" export DATA="data.txt"
GPUS_PER_NODE=8 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001
DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT "
torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --data_path $DATA \ --img_size 490 \ --given_num True \ --bf16 True \ --fix_vit True \ --fix_sampler False \ --use_lora False \ --output_dir output/test \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "epoch" \ --save_total_limit 1 \ --learning_rate 1e-5 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --report_to "none" \ --max_length 1024 \ --deepspeed ds_config_zero2.json \ --gradient_checkpointing True
训练数据,按指定格式构造
训练日志 {'loss': 0.0, 'learning_rate': 8.333333333333333e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 2.5e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 4.166666666666667e-06, 'epoch': 0.01}
{'loss': 0.0, 'learning_rate': 5e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 5.833333333333334e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 7.500000000000001e-06, 'epoch': 0.02}
{'loss': 0.0, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.166666666666666e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 1e-05, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.999981599807402e-06, 'epoch': 0.03}
{'loss': 0.0, 'learning_rate': 9.99992639936503e-06, 'epoch': 0.04}