inf value occurs during forwarding process when fine-tuning VL branch with LLAVA-150K+MiniGPT4-3.5K+webvid-instruct

Great works! But I've met some problems and hope anyone has some ideas.

When I fine-tune the VL branch only with LLaMA-2 on image/video instruction datas, inf values occurs and the value of torch.max(hidden_states) and torch.min(hidden_states) becomes larger and larger.

Several attempts have been made:

[x] I have already checked the issue lists.
[x] I have consulted the huggingface forum and searched the google results.

Preparations:

My platform: 8*A6000 48G, the environment is setup exactly following the environment.yml in this repository. The data is prepared following LLaVa (coco), WebVid-10M and MiniGPT-4. 7B LLaMA-2 Pretrained weights are from this repo as well.

The demo correctly runs on remote platform, and training process seems correct. I did not modify any code here.

Problem

I found that some data can occur 'inf' numbers at the last layer of LLaMA-2, where the index of decoder layer number is 31 in the autoregressive loop in LLaMA-2. The error does not occurs immediately, instead, the value of torch.max(hidden_states) and torch.min(hidden_states) becomes larger and larger for positives / smaller and smaller for negatives.

-inf of hidden_states training

Do you or anyone have any ideas on why this problem occurs, and how to solve it? I appreciate anyone's time and help in advance.

DAMO-NLP-SG / Video-LLaMA

inf value occurs during forwarding process when fine-tuning VL branch with LLAVA-150K+MiniGPT4-3.5K+webvid-instruct #138

Preparations:

Problem