How to use Video in the provided interface?

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.33k stars 176 forks source link

How to use Video in the provided interface? #58

Closed nullnameno closed 1 year ago

nullnameno commented 1 year ago

I am confused about how to pass the video into the model through the interface example you provided? Looking forward to your help，Thanks！

MAGAer13 commented 1 year ago

Hi, we do not support the video inference in our current codebase, we will support this soon! Now, you can try the demo on huggingface / modelscope for advanced video support, which is trained on Webvid10M.

nullnameno commented 1 year ago

Thank you for your reply. I use the _def loadvideo() function in the conversation.py file to divide the video into 4 (default) images. Then input it directly into interface. py without any modifications to replace the input of the image. The reasoning result of mPLUG-owl seems somewhat reasonable. I would like to confirm with you, is mPLUG-owl implemented in this way? Thanks again!

MAGAer13 commented 1 year ago

It's a possible way to treat video into several images (e.g. 4 in your case). You need to add 4 "" tokens as the placeholder for video. Otherwise you would only use the first image as the input.

nullnameno commented 1 year ago

Thank you very much for your reply! And look forward to the update of mPLUG-owl!

MAGAer13 commented 1 year ago

We will release the bilingual version of mPLUG-Owl very soon~ We would be happy if you would like to invite more people to use our model.😄