facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

How to get video features in VideoCLIP without any access to captions #5429

Open learn2phoenix opened 8 months ago

learn2phoenix commented 8 months ago

For VideoCLIP, how do we get the video encoder features without access to any captions. The default code at https://github.com/facebookresearch/fairseq/blob/main/examples/MMPT/README.md results in MMBertForEncoder as the final video encoder and it requires input_ids which in turn are based on caps. How do we work around this?

MarkChenYutian commented 7 months ago

Hi, I'm also working on this problem. It seems like the MMPTModel.model is one of the subclass of MMFusionShare and you can use the MMPTModel.model.forward_video directly when having only video or use ***.forward_text when only have text.

Screenshot 2024-02-25 at 4 24 53 PM
qingy1337 commented 3 weeks ago

Hi, did any of you end up getting the VideoCLIP example to work? Could you please share your package versions and stuff? I can't get it to run.