InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.43k stars 150 forks source link

Issue about pretrain loss #106

Closed yiyexy closed 8 months ago

yiyexy commented 8 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image
xiaoachen98 commented 8 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image

It is so wired. Could you provide your training script? Have you modified any training code?

yiyexy commented 8 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image

It is so wired. Could you provide your training script? Have you modified any training code?

Sure,I just follow the LLaVa1.5.

PROMPT_VERSION=plain
deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path ./checkpoints/$MODEL_VERSION \
    --version $PROMPT_VERSION \
    --data_path /path/to/pretrain_data.json \
    --image_folder /path/to/images \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ./checkpoints/llava-$MODEL_VERSION-pretrain \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 24000 \
    --save_total_limit 1 \
    --learning_rate 1e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb

And I note that the learning rate and batch size are not the same as yours. I will fix that and tell you later. My modifications to the training code did not influence the results with the llava dataset.

xiaoachen98 commented 8 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image

It is so wired. Could you provide your training script? Have you modified any training code?

Sure,I just follow the LLaVa1.5.

PROMPT_VERSION=plain
deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path ./checkpoints/$MODEL_VERSION \
    --version $PROMPT_VERSION \
    --data_path /path/to/pretrain_data.json \
    --image_folder /path/to/images \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ./checkpoints/llava-$MODEL_VERSION-pretrain \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 24000 \
    --save_total_limit 1 \
    --learning_rate 1e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb

And I note that the learning rate and batch size are not the same as yours. I will fix that and tell you later. My modifications to the training code did not influence the results with the llava dataset.

I found you close this issue. Did you solve it?

yiyexy commented 8 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image

It is so wired. Could you provide your training script? Have you modified any training code?

Sure,I just follow the LLaVa1.5.

PROMPT_VERSION=plain
deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path ./checkpoints/$MODEL_VERSION \
    --version $PROMPT_VERSION \
    --data_path /path/to/pretrain_data.json \
    --image_folder /path/to/images \
    --vision_tower openai/clip-vit-large-patch14 \
    --tune_mm_mlp_adapter True \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ./checkpoints/llava-$MODEL_VERSION-pretrain \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 24000 \
    --save_total_limit 1 \
    --learning_rate 1e-3 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb

And I note that the learning rate and batch size are not the same as yours. I will fix that and tell you later. My modifications to the training code did not influence the results with the llava dataset.

I found you close this issue. Did you solve it?

I changed the learning rate from 1e-3 to 2e-5, which is same as yours, and trained the vision encoder in the last half of the architecture during the pre-training stage. Finally, the loss continued to decrease to the 0.2x range.

However, I can't explain this issue, and ultimately, my results are not better than those from LLaVA1.5 based on your dataset and using the LLaVA1.5 code. Perhaps there are some other aspects that I haven't considered.

futureisatyourhand commented 5 months ago

Hi! Thanks for the great work! I encountered an issue during the pretraining stage. I was fine-tuning the vision tower, the linear adapter, and the Large Language Model (LLM) in the pretraining stage. The loss decreased to around 0.7X at 0.02, but it suddenly spiked to over 20. After some time, it gradually decreased to around 6.x. Is this normal? And why did this happen? image

image

hello,can you tell your resolvent? thanks