facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

Importing LXMERT pre-trained weights #607

Closed NohTow closed 1 year ago

NohTow commented 4 years ago

❓ Questions and Help

Hello,

I encountered a strange behavior while tuning LXMERT. I tuned both ViLBERT and VisualBERT without any issues using pre-trained weight from the model zoo, yet LXMERT is not in the zoo. So I trained LXMERT using mmf_run config=projects/lxmert/configs/hateful_memes/pretrain.yaml model=lxmert dataset=hateful_memes, yet I realized that it should be "vanilla" LXMERT (without pre-training) because I couldn't find any reference to pre-trained weights. So I downloaded pre-trained weights from the original repo (https://github.com/airsplay/lxmert give me the link http://nlp1.cs.unc.edu/data/model_LXRT.pth). Then I tried tuning the model using mmf_run config=projects/lxmert/configs/hateful_memes/pretrain.yaml model=lxmert dataset=hateful_memes checkpoint.resume_pretrained=True checkpoint.resume_file=data/models/lxmert.pretrained/model.pth.

The problem is, training metrics (accuracy and loss) are exactly the same for each step (for the "vanilla" and the pre-trained one) , meaning that loading weights had no impact at all. I know the import part is working since when I try to load weights from the model I originally trained, metrics are different (loss is really low and acc really high).

I searched for a long time into every file that might be linked to that (for example, an implicit initialization), yet I didn't find anything. Config files seems right and the additionnal arguments to the command added the line "| INFO | mmf.utils.checkpoint : Loading checkpoint" into logs.

Do someone have any idea on why this occurs ?

Thank you very much !

(This is linked to the last question of this issue : https://github.com/facebookresearch/mmf/issues/566, yet the original question was not about it and it will be closed due to a fix for the initial question).

apsdehal commented 4 years ago

Hi, The LXMERT pretrained weights haven't been uploaded as of now. But, there is a PR #561 that plans to fix it but is incomplete. The original LXMERT repo's weights won't match directly with what we have in MMF so it may not work out of the box. @vedanuj can help you in getting the correct weights, so I would wait on a response from him.

NohTow commented 4 years ago

Oh okay, that explains why loading my own trained weights is working but not LXMERT's original repo weights. I'll look into the PR you mentionned and also wait for additionnal information from @vedanuj.

Thank you very much !

vedanuj commented 4 years ago

Hello @NohTow , we do not have pertrained LXMERT model weights yet for mmf. As mentioned above, PR #561 has some pretrained models but that will require some work to make sure all the models are compatible. I will wait for a response from the author of the PR.

congchan commented 4 years ago

@NohTow Hello, I am also trying to finetune lxmert on hateful_memes, but encounter a TypeError

File "mmf/models/lxmert.py", line 668, in get_image_and_text_features
               image_location_variable = image_location_variable[:, : max_features.item(), :4]
TypeError: list indices must be integers or slices, not tuple
NohTow commented 4 years ago

@congchan Hello, I encountered an error like this during my test, but I can't really remember how I exactly solve it.. If I am right, it has to do with the config file. This error is due to your features not being in the right format. I corrected it by adding

    hateful_memes:
      processors:
        text_processor:
          type: bert_tokenizer
          params:
            tokenizer_config:
              type: bert-base-uncased
              params:
                do_lower_case: true
            mask_probability: 0
            max_seq_length: 128
        transformer_bbox_processor:
          type: transformer_bbox
          params:
            bbox_key: bbox
            image_width_key: image_width
            image_height_key: image_height    

      use_images: false
      use_features: true

into the config file. I link you both pretrain.yaml and defaults.yaml so you can run your training with the same config files and see if it corrects your error. These config files need to be put under /projects/lxmert/configs/hateful_memes and run the training using pretrain.yaml. Hope it helps.

configs.zip

Edit : @vedanuj, since the author closed the PR, I tried loading one of the weight file, yet it seems that it is not compatible with mmf neither( since I get the same result). Do you confirm this fact ? Is there any hope that it will be fixed even if the PR is closed ? Thank you very much !

congchan commented 4 years ago

@NohTow Thanks!! It did works!! @vedanuj I am also looking forward for this pretrained parameters available in mmf. BTW, what is required for a model parameters to be compatible in mmf?

NohTow commented 4 years ago

Hi again, So I digged a bit into this issue and it seems that weights of the PR have the same "architecture" of mmf's implementation of LXMERT, yet there is a problem in their names. When printing their name I got this structure : "bert.XXX.YYY" (XXX and YYY being layers' names). Yet, when priting names from my tuned version of LXMERT (using mmf training procedure), the structure is "model.bert.XXX.YYY". Except this difference, every weight has the same name.

Renaming them using the joint script allows to make it compatible. I still have a little issue with the size of the last FC layer (since VQA output size is not 2 because it is not a binary classification like in hateful memes). I fixed it by discarding the last layer weights by changing key_transformer definition to

def key_transformation(old_key):
    if("logit_fc.3" in old_key):
        return("")
    else:
        return "model." + old_key

But there is obviously a cleaner solution. I will dig into it today, but I struggle a bit to understand why the author of the PR gave 3 different pretrained weight files. I will check the paper again and give update when I fully understand the pre-training procedure. rename_state_dict_keys.zip

congchan commented 4 years ago

Follow your instruction and rename the lxmert author's weights from mudule.... to model...... Thanks, it works. Although it turns out that the pretrained weight performs significantly worse than a vanilla one on hateful memes dataset.

NohTow commented 4 years ago

Yes, my experiments give the same conclusion as yours. Although pre-trained models can perform worse, it seems that different pre-trained weights (from original authors and the different one in the PR) give same results. It might indicate that there is still an issue in the loaded weights (because they are expected to be at least a bit different, so results should also be). Maybe I am missing something. @vedanuj, do you have any idea of what can still be wrong in the loading weights procedure ? I can do some tests if you have an intuition. Thanks in advance if you can investigate this.

vedanuj commented 4 years ago

Can you point me to which pretrained weights you are talking about here? Please provide the link. I can help check how to load them properly.

NohTow commented 4 years ago

I tried the one on the original repo (https://github.com/airsplay/lxmert) which can be downloaded at this link and also tried #561 ones, available here, I tried both vqa2_pretrained and gqa_pretrained.

Let me know if you need more information. Again, thanks for looking into this.