facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

The config hierarchy not works as expected #633

Closed congchan closed 4 years ago

congchan commented 4 years ago

🐛 Bug

The config hierarchy, The config user passed by command line config=/mmf/projects/visual_bert/configs/hateful_memes/from_coco.yaml is

includes:
- ./defaults.yaml

checkpoint:
  resume_pretrained: true
  resume_zoo: visual_bert.pretrained.coco

which includes ./defaults.yaml contains:

dataset_config:
  hateful_memes:
    return_features_info: true
    processors:
      text_processor:
        type: bert_tokenizer
        params:
          tokenizer_config:
            type: bert-base-uncased
            params:
              do_lower_case: true
          mask_probability: 0
          max_seq_length: 128

I expected mmf would override the config and call BertTokenizer.

However the program still called the vocab type processor VocabProcessor, which was defined in mmf/configs/datasets/hateful_memes/defaults.yaml

Command

mmf_run config=projects/hateful_memes/configs/visual_bert/from_coco.yaml \
    model=visual_bert \
    dataset=hateful_memes

To Reproduce

Steps to reproduce the behavior:

mmf_run config=projects/hateful_memes/configs/visual_bert/from_coco.yaml \
    model=visual_bert \
    dataset=hateful_memes

Expected behavior

I expected mmf would override the config and call BertTokenizer.

Environment

Please copy and paste the output from the environment collection script from PyTorch (or fill out the checklist below manually).

You can run the script with:

# For security purposes, please check the contents of collect_env.py before running it.
python -m torch.utils.collect_env

Additional context

vedanuj commented 4 years ago

I just ran the same command and I don't see this issue. Can you provide any logs that shows BertTokenizer is not getting called?

congchan commented 4 years ago

I fix it, forgot to specify the tokenizer_config, thanks anyway