X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.33k stars 176 forks source link

NaN error on videoQA #123

Open sgjheywa opened 1 year ago

sgjheywa commented 1 year ago

Hi,

Thanks for sharing this repo!

I am trying to test the video model and I keep getting the same error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-e6f6b4177957> in <cell line: 1>()
      1 with torch.no_grad():
----> 2     res = model.generate(**inputs, **generate_kwargs)
      3 sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True)
      4 print(sentence)
      5 

4 frames
/usr/local/lib/python3.10/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     25         def decorate_context(*args, **kwargs):
     26             with self.clone():
---> 27                 return func(*args, **kwargs)
     28         return cast(F, decorate_context)
     29 

/content/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py in generate(self, pixel_values, video_pixel_values, input_ids, attention_mask, isdecoder, **generate_kwargs)
   1751 
   1752         print(inputs_embeds, attention_mask)
-> 1753         outputs = self.language_model.generate(
   1754             inputs_embeds=inputs_embeds,
   1755             # input_ids=input_ids,

/usr/local/lib/python3.10/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     25         def decorate_context(*args, **kwargs):
     26             with self.clone():
---> 27                 return func(*args, **kwargs)
     28         return cast(F, decorate_context)
     29 

/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs)
   1483 
   1484             # 13. run sample
-> 1485             return self.sample(
   1486                 input_ids,
   1487                 logits_processor=logits_processor,

/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py in sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
   2558             # sample
   2559             probs = nn.functional.softmax(next_token_scores, dim=-1)
-> 2560             next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
   2561 
   2562             # finished sentences should have their next token be a padding token

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Looking at similar issues online it looks like a bfloat16/fp32 issue but I've tried running the model at both precisions and both on the CPU and GPU and get the same error. somewhere in the inference a tensor turns to NaNs, this is only occuring during videoQA, image is working fine.

Can you help me understand where this error might be occurring? Here is a colab recreating the error (I'm running it on a V100): https://colab.research.google.com/drive/1znwvEgSYoqbA67BH3S1ppmWEWIy8cDDt?usp=sharing

zwan074 commented 1 year ago

i have the same issue about video inference, please sort it out

ee2110 commented 1 year ago

Same here! Hope to get the solutions as soon as possible. Thank you so much for your great works and efforts!

Meanwhile, does anyone know any baselines method could be comparable with the mPLUG-Owl in videoQA?

JonghwanMun commented 1 year ago

It looks like that one of weights in the official checkpoint has NaN value.

image
MAGAer13 commented 1 year ago

See #101. Also we have update the checkpoint in HF.