facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.46k stars 932 forks source link

M4C remove features approach #1227

Closed emanuelevivoli closed 2 years ago

emanuelevivoli commented 2 years ago

❓ Questions and Help

Hello, I'm new to MMF and its implementation of M4C, and so, I have a doubt. The approach to not using some feature (for example PHOC features for OCR) is that you specify the remove_ocr_phoc property of config.ocr :

# in mmf>models>m4c.py

def _build_ocr_encoding(self):

        self.remove_ocr_fasttext = self.config.ocr.get("remove_ocr_fasttext", False)
        # I don't want to use the PHOC  
        self.remove_ocr_phoc = self.config.ocr.get("remove_ocr_phoc", True)
        self.remove_ocr_frcn = self.config.ocr.get("remove_ocr_frcn", False)
        self.remove_ocr_semantics = self.config.ocr.get("remove_ocr_semantics", False)
        self.remove_ocr_bbox = self.config.ocr.get("remove_ocr_bbox", False)

but you actually calculate all the features in:

# in mmf>models>m4c.py

def _forward_ocr_encoding(self, sample_list, fwd_results):
       # ...

        # OCR PHOC feature (604-dim)
        ocr_phoc = sample_list.context_feature_1
        ocr_phoc = F.normalize(ocr_phoc, dim=-1)
        assert ocr_phoc.size(-1) == 604
        # ...

and eventually, instead or removing the features, you set them as zeros in:

        # ...
        if self.remove_ocr_phoc:
            ocr_phoc = torch.zeros_like(ocr_phoc)
        # ...
        # here are concatenated to all features
        ocr_feat = torch.cat(
            [ocr_fasttext, ocr_phoc, ocr_fc7, ocr_order_vectors], dim=-1
        )
        ocr_bbox = sample_list.ocr_bbox_coordinates
        if self.remove_ocr_semantics:
            ocr_feat = torch.zeros_like(ocr_feat)
        if self.remove_ocr_bbox:
            ocr_bbox = torch.zeros_like(ocr_bbox)
        # here they goes to the Linear layer for having the desired output dimension
        ocr_mmt_in = self.ocr_feat_layer_norm(
            self.linear_ocr_feat_to_mmt_in(ocr_feat)
        ) + self.ocr_bbox_layer_norm(self.linear_ocr_bbox_to_mmt_in(ocr_bbox))
        # ...

So my question is: Have I understood well? In case, there are some "issues":

Is there a way (that you can suggest) to change this behaviour? If there is, I could work on it.

Thanks a lot, Emanuele

emanuelevivoli commented 2 years ago

Hi @apsdehal , I tag you as I saw the m4c commits are mainly from you. Sorry for bothering but I'd like to have some insight on the m4c features behaviour.

Thanks, Emanuele

apsdehal commented 2 years ago

Tagging @ronghanghu since he actually implemented the code.

ronghanghu commented 2 years ago

Hi @emanuelevivoli, yes, you understood well -- when we removed some features, we still kept their dimensions but set their values to zero. The main benefit of this compared to removing their dimensions is that the model size stays the same, which makes it easier for us to e.g. convert model checkpoints. The downside is that nn.Linear layers can be larger than what it actually needs to be.

But it should also work well if you change it to directly removing the feature dimensions.