Open learn2phoenix opened 8 months ago
Hi, I'm also working on this problem. It seems like the MMPTModel.model
is one of the subclass of MMFusionShare
and you can use the MMPTModel.model.forward_video
directly when having only video or use ***.forward_text
when only have text.
Hi, did any of you end up getting the VideoCLIP example to work? Could you please share your package versions and stuff? I can't get it to run.
For
VideoCLIP
, how do we get the video encoder features without access to any captions. The default code at https://github.com/facebookresearch/fairseq/blob/main/examples/MMPT/README.md results inMMBertForEncoder
as the final video encoder and it requiresinput_ids
which in turn are based oncaps
. How do we work around this?