Closed nullnameno closed 1 year ago
Hi, we do not support the video inference in our current codebase, we will support this soon! Now, you can try the demo on huggingface / modelscope for advanced video support, which is trained on Webvid10M.
Thank you for your reply. I use the _def loadvideo() function in the conversation.py file to divide the video into 4 (default) images. Then input it directly into interface. py without any modifications to replace the input of the image. The reasoning result of mPLUG-owl seems somewhat reasonable. I would like to confirm with you, is mPLUG-owl implemented in this way? Thanks again!
It's a possible way to treat video into several images (e.g. 4 in your case). You need to add 4 "
Thank you very much for your reply! And look forward to the update of mPLUG-owl!
We will release the bilingual version of mPLUG-Owl very soon~ We would be happy if you would like to invite more people to use our model.😄
I am confused about how to pass the video into the model through the interface example you provided? Looking forward to your help,Thanks!