finetune InternVL2 loss is 0

eddatt commented 1 month ago

use internvl_chat_v1_2_hermes2_yi34b_448_res_finetune_continue_lora.sh script to finetune InternVL2-8B

facing err: ValueError: Target modules {'mlp.down_proj', 'self_attn.o_proj', 'self_attn.q_proj', 'mlp.up_proj', 'mlp.gate_proj', 'self_attn.k_proj', 'self_attn.v_proj'} not found in the base model.， and fix the err with issue 150 https://github.com/OpenGVLab/InternVL/issues/150

using the fix from issue 150 i successfully run the funetune but loss is always 0 during finetuning

here is my script:

`export PYTHONPATH="${PYTHONPATH}:$(pwd)" export MASTER_PORT=34229 export TF_CPP_MIN_LOG_LEVEL=3 export LAUNCHER=pytorch

torchrun \ --nnodes=1 \ --node_rank=0 \ --master_addr=127.0.0.1 \ --nproc_per_node=4 \ --master_port=34229 \ ${SRUN_ARGS} \ internvl/train/internvl_chat_finetune.py \ --model_name_or_path "/mnt/internvl2/InternVL2-8B" \ --conv_style "phi3-chat" \ --output_dir "/mnt/output/finetune_test04" \ --meta_path "/mnt/appediet_internvl2_finetune_1000_meta_v2.json" \ --overwrite_output_dir True \ --force_image_size 448 \ --down_sample_ratio 0.5 \ --drop_path_rate 0.0 \ --pad2square False \ --freeze_llm True \ --freeze_mlp True \ --freeze_backbone True \ --use_llm_lora 16 \ --vision_select_layer -1 \ --use_data_resampling False \ --dataloader_num_workers 4 \ --bf16 True \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --save_total_limit 1 \ --learning_rate 1e-5 \ --weight_decay 0.05 \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --max_seq_length 2048 \ --do_train True \ --grad_checkpoint True \ --group_by_length True \ --deepspeed "zero_stage3_config.json" \ --report_to "none" \ 2>&1 | tee -a "/mnt/output/training_log_04.txt" `

here is loss output Screenshot 2024-07-19 at 18 05 58

thanks for any help

feihuamantian commented 1 month ago

https://github.com/OpenGVLab/InternVL/issues/351

Single430 commented 1 month ago

是不是数据量太少，完全过拟合了

1518630367 commented 1 month ago

facing err: ValueError: Target modules {'mlp.down_proj', 'self_attn.o_proj', 'self_attn.q_proj', 'mlp.up_proj', 'mlp.gate_proj', 'self_attn.k_proj', 'self_attn.v_proj'} not found in the base model.，

请问解决了吗？我也是这个问题

eddatt commented 1 month ago

是不是数据量太少，完全过拟合了

小batch跑测试，但是从一开始就是loss 0

eddatt commented 1 month ago

facing err: ValueError: Target modules {'mlp.down_proj', 'self_attn.o_proj', 'self_attn.q_proj', 'mlp.up_proj', 'mlp.gate_proj', 'self_attn.k_proj', 'self_attn.v_proj'} not found in the base model.，

请问解决了吗？我也是这个问题

and fix the err with issue 150 https://github.com/OpenGVLab/InternVL/issues/150

用issue 150的方法，改了代码，跑起来了，但是loss 一直是0

1518630367 commented 1 month ago

facing err: ValueError: Target modules {'mlp.down_proj', 'self_attn.o_proj', 'self_attn.q_proj', 'mlp.up_proj', 'mlp.gate_proj', 'self_attn.k_proj', 'self_attn.v_proj'} not found in the base model.，

请问解决了吗？我也是这个问题

and fix the err with issue 150 https://github.com/OpenGVLab/InternVL/issues/150

用issue 150的方法，改了代码，跑起来了，但是loss 一直是0

我解决了窗口长度设置大一些就好了

czczup commented 1 month ago

看这个脚本，loss为0是因为用错了对话模板。您可以按照这个指南进行InternVL2的微调：https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html

zweiqi commented 2 weeks ago

facing err: ValueError: Target modules {'mlp.down_proj', 'self_attn.o_proj', 'self_attn.q_proj', 'mlp.up_proj', 'mlp.gate_proj', 'self_attn.k_proj', 'self_attn.v_proj'} not found in the base model.，

请问解决了吗？我也是这个问题

and fix the err with issue 150 https://github.com/OpenGVLab/InternVL/issues/150 用issue 150的方法，改了代码，跑起来了，但是loss 一直是0

我解决了窗口长度设置大一些就好了

请问在哪里设置窗口长度

OpenGVLab / InternVL

finetune InternVL2 loss is 0 #388