How can I output one caption using image sequence?

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.42k stars 248 forks source link

How can I output one caption using image sequence? #329

Closed funykatebird closed 1 year ago

funykatebird commented 1 year ago

I want to generate a video caption, but how can I use the image sequence? I could not find out which part to modify to use more than one image for the caption.

JustinLin610 commented 1 year ago

We did not release methods about processing videos. One way to perform this task might be computing visual features with our ResNet adaptor for the images and using average pooling to make them one feature matrix as the input to the transformer. Also, you can try using more complex ways for the pooling.

BTW, we have done experiments on video with our new project OFASys. See if it can help. https://github.com/OFA-Sys/OFASys