PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.91k stars 121 forks source link

is support video ? #15

Open awzhgw opened 7 months ago

awzhgw commented 7 months ago

is support video train?

LinB203 commented 7 months ago

Sure. Our code supports multi-image training, multi-video training, and even image-video training together.

fcakyon commented 7 months ago

@LinB203 is there any video predict code in the repo to test it on a mp4 file? Or a preprocessing script showing how to sample the frames from a video?

LinB203 commented 7 months ago

This repo support training video but do not release checkpoint about video. So you may need to train a new model to support video. For video predict code or how to adapt video encoder into MoE-LLaVA, you can refer to Video-LLaVA.