can OFA support video language tasks such as video-caption?

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.41k stars 248 forks source link

can OFA support video language tasks such as video-caption? #264

Open dinglei8908 opened 2 years ago

dinglei8908 commented 2 years ago

suppose we can extract several frames from video, any suggestions about this?

JustinLin610 commented 2 years ago

Not done yet, but possible. We still need to figure out if we need to make changes on pretraining, or simply adapt the pretrained models to this task. The simplest way might be treating the average of frames as an image.