X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

How to process video input #32

Closed k1tano closed 1 year ago

k1tano commented 1 year ago

I can input video in the Hugging face demo, but I can't find any relevant video data processing in the code. are you only sampling 4 frames of video in the front end and inputting them into the model as images?This is very important to me, please let me know, thanks!

MAGAer13 commented 1 year ago

Yes, currently we treat video as consecutive pictures as used them as input. And we will continue train on video related datasets in the future. Stay tuned.

xmy0916 commented 1 year ago

@MAGAer13 how many frames do you used?

shaswati1 commented 10 months ago

@k1tano, which huggingface model did you use to input video directly? I found KeyError while trying to load the model in this link.