X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

mplug video forward pass issue #157

Open kcz358 opened 1 year ago

kcz358 commented 1 year ago

Hi,

As I trying to do forward pass using MplugOwlForConditionalGeneration for video, I set pixel_values to None and video_pixel_values to the processed videoes. And this portion of the code would cause an issue in forward pass since query features is only defined when pixel_values is not None.

image

May I ask is there a potential solution to allow us to perform forward pass using only videos?

Best regards

shaswati1 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes https://github.com/X-PLUG/mPLUG-Owl/issues/180#issuecomment-1837700889

@kcz358, is it also the case for you?

kcz358 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment)

@kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

shaswati1 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment) @kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

  • Instead of using the original forward pass to extract input embeds, you can copy everything in the generate function except the language_model.generate to correctly extract input embeds for the videos.
  • For the mask such as 'non_padding_mask', 'prompt_mask', I choose to create my own mask by using something like non_padding_mask = labels != self.tokenizer.pad_token_id. You can then remove the correspond code inside the forward pass related to masking. Or you can just create these mask tensors and then pass to the forward function, but I didn't try this because I am not certain about the shape and data type for these mask tensors.

@kcz358, thanks a lot for your help! I did the same as yours with the non_padding_mask but it gives me nan value for the loss. Have you faced this issue? If yes, do you know the reason behind that?

kcz358 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment) @kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

  • Instead of using the original forward pass to extract input embeds, you can copy everything in the generate function except the language_model.generate to correctly extract input embeds for the videos.
  • For the mask such as 'non_padding_mask', 'prompt_mask', I choose to create my own mask by using something like non_padding_mask = labels != self.tokenizer.pad_token_id. You can then remove the correspond code inside the forward pass related to masking. Or you can just create these mask tensors and then pass to the forward function, but I didn't try this because I am not certain about the shape and data type for these mask tensors.

@kcz358, thanks a lot for your help! I did the same as yours with the non_padding_mask but it gives me nan value for the loss. Have you faced this issue? If yes, do you know the reason behind that?

@shaswati1 , I did not encounter this issue when I do forward pass so I am not sure. I use the loss to test perplexity so I do not perform training. If you encountering this issue when you do fine-tuning, you may want to check your dataset labels.