facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.46k stars 932 forks source link

Redundant forward codes in ViLT model #1197

Open majinyu666 opened 2 years ago

majinyu666 commented 2 years ago

In mmf/models/vilt.py: L158~L164, text embedding and image embedding are concatenated and feed forward through the encoder to get hidden states. However, exactly the sample process is done in L276~L282, which is called in L155. So the inputs are forwarded twice actually. Expected behavior: remove redundant forward codes.

majinyu666 commented 2 years ago

I suppose this will not affect model results, but it looks messy