X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

error in video demo code #97

Closed LinB203 closed 1 year ago

LinB203 commented 1 year ago

If I use demo video code, I will get this error.

tokenizer = AutoTokenizer.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir') File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 688, in from_pretrained raise ValueError( ValueError: Tokenizer class MplugOwlTokenizer does not exist or is not currently imported.

It seems that you do not upload the pretrained_ckpt('MAGAer13/mplug-owl-llama-7b-video') for AutoTokenizer. If I only use 'MAGAer13/mplug-owl-llama-7b' for AutoTokenizer and 'MAGAer13/mplug-owl-llama-7b-video' for others, I will get this error.

Traceback (most recent call last): File "demo.py", line 57, in res = model.generate(inputs, generate_kwargs) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 1752, in generate outputs = self.language_model.generate( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate return self.sample( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/transformers/generation/utils.py", line 2655, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Anything I miss?

MAGAer13 commented 1 year ago

Sorry for that. We have replaced MplugOwlTokenizer with LlamaTokenizer for simplification. We have updated the model repo. Just replace MplugOwlTokenizer with LlamaTokenizer in tokenizer_config.json

LinB203 commented 1 year ago

Sorry for that. We have replaced MplugOwlTokenizer with LlamaTokenizer for simplification. We have updated the model repo. Just replace MplugOwlTokenizer with LlamaTokenizer in tokenizer_config.json

After your advice, I still get this error, anything I miss? Running with torch1.13.0, on V100 and cuda 11.6.

Traceback (most recent call last): File "d.py", line 37, in res = model.generate(inputs, generate_kwargs) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 1694, in generate video_embeds = self.vision_model(video_pixel_values, return_dict=True).last_hidden_state File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 693, in forward encoder_outputs = self.encoder( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 633, in forward layer_outputs = encoder_layer( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 371, in forward hidden_states = hidden_states + self.temporal(hidden_states) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, **kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 213, in forward x = torch.nn.functional.conv3d( RuntimeError: "compute_columns3d" not implemented for 'Half'

MAGAer13 commented 1 year ago

Did you run this demo under bfloat16?

LinB203 commented 1 year ago

Sure, following demo code of video.

import torch

from mplug_owl_video.modeling_mplug_owl import MplugOwlForConditionalGeneration from transformers import AutoTokenizer from mplug_owl_video.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b-video' model = MplugOwlForConditionalGeneration.from_pretrained( pretrained_ckpt, torch_dtype=torch.bfloat16, cache_dir='./cache_dir' ) image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir') tokenizer = AutoTokenizer.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir') processor = MplugOwlProcessor(image_processor, tokenizer)

prompts = [ '''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: <|video|> Human: Describe the video. AI: ''']

video_list = ['cap_video/dVC8Dl0xCKg.mp4']

generate_kwargs = { 'do_sample': True, 'top_k': 3, 'max_length': 512 } inputs = processor(text=prompts, videos=video_list, num_frames=8, return_tensors='pt') inputs = {k: v.bfloat16() if v.dtype == torch.float else v for k, v in inputs.items()} inputs = {k: v.to(model.device) for k, v in inputs.items()} with torch.no_grad(): res = model.generate(inputs, generate_kwargs) sentence = tokenizer.decode(res.tolist()[0], skip_special_tokens=True) print(sentence)

MAGAer13 commented 1 year ago

Since the model is training under bfloat16, while the DepthwiseConv3D only support half, so we do the conversion from bf16 to half during Conv then we convert it back which leads to the unstable results.

MAGAer13 commented 1 year ago

Maybe you can try to load the model into float16?

LinB203 commented 1 year ago

Maybe you can try to load the model into float16?

The demo code do not put model to GPU device. if I use model.cuda(), I will get this error.

Traceback (most recent call last): File "d.py", line 41, in res = model.generate(inputs, generate_kwargs) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/mplug/mPLUG-Owl/mplug_owl_video/modeling_mplug_owl.py", line 1752, in generate outputs = self.language_model.generate( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate return self.sample( File "/miniconda3/envs/torch1.13.0/lib/python3.8/site-packages/transformers/generation/utils.py", line 2655, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

MAGAer13 commented 1 year ago

There are nan in the model's forward. I will check this later.

dmenig commented 1 year ago

I have the same issue.

LinB203 commented 1 year ago

There are nan in the model's forward. I will check this later.

hi, any news?

ceyxasm commented 1 year ago

Can this issue be reopened. As I am also facing the issue.