NaN results for video inference

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.25k stars 171 forks source link

NaN results for video inference #101

Closed yinkangning0124 closed 1 year ago

yinkangning0124 commented 1 year ago

Hi, Thanks for your awesome work, I meet the same problem as others when running video inference. I debugged the code step by step and found that the hidden_states turn into NaN when doing self attention in modeling_llama.py (line 292). nan Could you tell me how to fix this problem?

Best, Kangning

LinB203 commented 1 year ago

I also get the nan error.

yinkangning0124 commented 1 year ago

I also get the nan error.

I just noticed that you commit an issue on the same problem last week, so have you fixed it?

LinB203 commented 1 year ago

I also get the nan error.

I just noticed that you commit an issue on the same problem last week, so have you fixed it?

No yet. I am waiting that maybe the author will fix the problem?

MAGAer13 commented 1 year ago

Hi, we have found that the model's weight on HF have some numerical difference with our origin version. We would like to replace it with the correct one, but we are facing the network connection issue to HF currently. We would provide a temporary downloading link to the correct weight.

http://mm-chatgpt.oss-cn-zhangjiakou.aliyuncs.com/mplug_owl_demo/released_checkpoint/pytorch_model.bin

yinkangning0124 commented 1 year ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

LinB203 commented 1 year ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

Firstly, find your HF cache folder, usually in ~/.cache/huggingface/hub. Secondly, use new checkpoint replace the models--MAGAer13--mplug-owl-llama-7b-video/snapshots/*/pytorch_model.bin

MAGAer13 commented 1 year ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

Firstly, find your HF cache folder, usually in ~/.cache/huggingface/hub. Secondly, use new checkpoint replace the models--MAGAer13--mplug-owl-llama-7b-video/snapshots/*/pytorch_model.bin

Does it work now?

LinB203 commented 1 year ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

Firstly, find your HF cache folder, usually in ~/.cache/huggingface/hub. Secondly, use new checkpoint replace the models--MAGAer13--mplug-owl-llama-7b-video/snapshots/*/pytorch_model.bin

Does it work now?

Yes. no nan problem.

yinkangning0124 commented 1 year ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

Firstly, find your HF cache folder, usually in ~/.cache/huggingface/hub. Secondly, use new checkpoint replace the models--MAGAer13--mplug-owl-llama-7b-video/snapshots/*/pytorch_model.bin

It works, thanks a lot !!!

Hritikbansal commented 1 year ago

@MAGAer13 , can you try updating the pytorch_model.bin on HF now? It may be useful to have a working checkpoint there. I faced the same issue and had to go through quite a few responses to find this solution. the problem does go away with the new checkpoints.

thanks for the awesome work!

shaswati1 commented 10 months ago

Hi, Could give me a tip on how to load the model (pytorch_model.bin), its a little different from load that from the HF

Firstly, find your HF cache folder, usually in ~/.cache/huggingface/hub. Secondly, use new checkpoint replace the models--MAGAer13--mplug-owl-llama-7b-video/snapshots/*/pytorch_model.bin

I tried the steps above and tried to load the pretrained model using below code: vid_mplug_path = 'MAGAer13/mplug-owl-llama-7b-video' vid_mplug = get_model_name_from_path(vid_mplug_path) tokenizer, model, image_processor, context_len = load_pretrained_model(vid_mplug_path, None, vid_mplug, load_8bit=False, load_4bit=False, device=device) It is giving me KeyError: 'mplug-owl'. Can you please help me out here?