facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Multimodal alignment - current state in MMF / how to use #947

Open schthms opened 3 years ago

schthms commented 3 years ago

❓ Questions and Help

I would like to use the multimodal alignment task in VisualBERT, ViLBERT and MMBT.

According to this issue 1 this still needs to be implemented. But apparently something similar was already provided here 2. Could I use 2 as an orientation for 1?

There is also a image_text_alignment tensor in the model definition of MMBT and VisualBERT. What is the use for that?

Would be very helpful if someone could explain what needs to be done in order to use the multimodal alignment task with the three models.

schthms commented 3 years ago

Could somebody help me please?

vedanuj commented 3 years ago

Image Text Alignment loss is not currently added to MMF. We will need to add the loss in order to use it. I can add the PR this week.

schthms commented 3 years ago

Thanks!