mplug video forward pass issue

kcz358 commented 1 year ago

Hi,

As I trying to do forward pass using MplugOwlForConditionalGeneration for video, I set pixel_values to None and video_pixel_values to the processed videoes. And this portion of the code would cause an issue in forward pass since query features is only defined when pixel_values is not None.

May I ask is there a potential solution to allow us to perform forward pass using only videos?

Best regards

shaswati1 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes https://github.com/X-PLUG/mPLUG-Owl/issues/180#issuecomment-1837700889

@kcz358, is it also the case for you?

kcz358 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment)

@kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

Instead of using the original forward pass to extract input embeds, you can copy everything in the generate function except the language_model.generate to correctly extract input embeds for the videos.
For the mask such as 'non_padding_mask', 'prompt_mask', I choose to create my own mask by using something like non_padding_mask = labels != self.tokenizer.pad_token_id. You can then remove the correspond code inside the forward pass related to masking. Or you can just create these mask tensors and then pass to the forward function, but I didn't try this because I am not certain about the shape and data type for these mask tensors.

shaswati1 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment) @kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

Instead of using the original forward pass to extract input embeds, you can copy everything in the generate function except the language_model.generate to correctly extract input embeds for the videos.

For the mask such as 'non_padding_mask', 'prompt_mask', I choose to create my own mask by using something like non_padding_mask = labels != self.tokenizer.pad_token_id. You can then remove the correspond code inside the forward pass related to masking. Or you can just create these mask tensors and then pass to the forward function, but I didn't try this because I am not certain about the shape and data type for these mask tensors.

@kcz358, thanks a lot for your help! I did the same as yours with the non_padding_mask but it gives me nan value for the loss. Have you faced this issue? If yes, do you know the reason behind that?

kcz358 commented 10 months ago

@MAGAer13, can you please help? Facing similar issue for other attributes #180 (comment) @kcz358, is it also the case for you?

Hi, this is also some of the issues I am facing when I try to do forward pass with the video support model. Here is my solution for the forward pass issue.

Instead of using the original forward pass to extract input embeds, you can copy everything in the generate function except the language_model.generate to correctly extract input embeds for the videos.

For the mask such as 'non_padding_mask', 'prompt_mask', I choose to create my own mask by using something like non_padding_mask = labels != self.tokenizer.pad_token_id. You can then remove the correspond code inside the forward pass related to masking. Or you can just create these mask tensors and then pass to the forward function, but I didn't try this because I am not certain about the shape and data type for these mask tensors.

@kcz358, thanks a lot for your help! I did the same as yours with the non_padding_mask but it gives me nan value for the loss. Have you faced this issue? If yes, do you know the reason behind that?

@shaswati1 , I did not encounter this issue when I do forward pass so I am not sure. I use the loss to test perplexity so I do not perform training. If you encountering this issue when you do fine-tuning, you may want to check your dataset labels.

X-PLUG / mPLUG-Owl

mplug video forward pass issue #157