facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 935 forks source link

cannot reproduce the hateful memes result #290

Closed xxxxxxxxy closed 4 years ago

xxxxxxxxy commented 4 years ago

❓ Questions and Help

I run the following instructions step by step:

1.pip install --upgrade --pre mmf
2.mmf_convert_hm --zip_file=/data-user/Lnmwdnq3YcF7F3YsJncp.zip --password=<masked>
3.mmf_run config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes
4.mmf_predict config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=test

The paper indicates that the result on the test dataset with ViLBERT CC is 70.03 or so, but I got 0.4647 after submitting the predictive result to the submission page.

An error occurs when training at the last step:

2020-06-05T06:43:26 INFO: Stepping into final validation check 2020-06-05T06:43:26 INFO: Evaluation time. Running on full validation set... 2020-06-05T06:43:35 INFO: progress: 22001/22000, val/total_loss: 3.5773, val/hateful_memes/cross_entropy: 3.5773, val/hateful_memes/accuracy: 0.5840, val/hateful_memes/binary_f1: 0.4317, val/hateful_memes/roc_auc: 0.6909, num_updates: 22001, epoch: 83, iterations: 22001, max_updates: 22000, val_time: 09s 123ms, best_update: 4000, best_iteration: 4000, best_val/hateful_memes/roc_auc: 0.699356 2020-06-05T06:43:36 INFO: Restoring checkpoint 2020-06-05T06:43:36 INFO: Loading checkpoint Traceback (most recent call last): File "/opt/conda/bin/mmf_run", line 8, in sys.exit(run()) File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 89, in run main(configuration, predict=predict) File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 40, in main trainer.train() File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 265, in train self.finalize() File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 311, in finalize self.checkpoint.restore() File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 412, in restore self._load(best_path, force=True) File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 171, in _load ckpt, should_continue = self._load_from_zoo(file) File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 314, in _load_from_zoo zoo_ckpt = load_pretrained_model(file) File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 56, in load_pretrained_model ), "None or multiple checkpoints files. MMF doesn't know what to do." AssertionError: None or multiple checkpoints files. MMF doesn't know what to do.

vedanuj commented 4 years ago

Hi @xxxxxxxxy

In your mmf_predict command you are not using the trained model but the coco pretrained model(that is not finetuned on the hm dataset). That may be the reason you are getting low scores. After you finish training you should get a trained model saved in your env.save_dir or if you haven't specified it then by default ./save directory.

You should use that model when running mmf_predict :

mmf_predict config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/visual_bert_final.pth

Can you try this and let us know if that worked?

vedanuj commented 4 years ago

Secondly, You are using vilbert config for mmf_train but for prediction you are using visual_bert in your command.

Your mmf_predict command for vilbert should be :

mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth
xxxxxxxxy commented 4 years ago

Hi @xxxxxxxxy

In your mmf_predict command you are not using the trained model but the coco pretrained model(that is not finetuned on the hm dataset). That may be the reason you are getting low scores. After you finish training you should get a trained model saved in your env.save_dir or if you haven't specified it then by default ./save directory.

You should use that model when running mmf_predict :

mmf_predict config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/visual_bert_final.pth

Can you try this and let us know if that worked?

I tried this:

mmf_predict config=projects/hateful_memes/configs/vilbert/defaults.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth

I submitted the produced results and I got 0.3715. Still cannot reproduce the result.

xxxxxxxxy commented 4 years ago

Secondly, You are using vilbert config for mmf_train but for prediction you are using visual_bert in your command.

Your mmf_predict command for vilbert should be :

mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth

I paste the wrong code. It should be what you wrote.

However, this problem occurs when training and not predicting.

Here is the problem:

2020-06-15T07:43:58 INFO: Stepping into final validation check
2020-06-15T07:43:58 INFO: Evaluation time. Running on full validation set...
2020-06-15T07:44:07 INFO: progress: 22001/22000, val/total_loss: 3.4931, val/hateful_memes/cross_entropy: 3.4931, val/hateful_memes/accuracy: 0.6000, val/hateful_memes/binary_f1: 0.4595, val/hateful_memes/roc_auc: 0.6858, num_updates: 22001, epoch: 83, iterations: 22001, max_updates: 22000, val_time: 09s 187ms, best_update: 6000, best_iteration: 6000, best_val/hateful_memes/roc_auc: 0.694604
2020-06-15T07:44:08 INFO: Restoring checkpoint
2020-06-15T07:44:08 INFO: Loading checkpoint
Traceback (most recent call last):
  File "/opt/conda/bin/mmf_run", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 89, in run
    main(configuration, predict=predict)
  File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 40, in main
    trainer.train()
  File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 265, in train
    self.finalize()
  File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 311, in finalize
    self.checkpoint.restore()
  File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 412, in restore
    self._load(best_path, force=True)
  File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 171, in _load
    ckpt, should_continue = self._load_from_zoo(file)
  File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 314, in _load_from_zoo
    zoo_ckpt = load_pretrained_model(file)
  File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 56, in load_pretrained_model
    ), "None or multiple checkpoints files. MMF doesn't know what to do."
AssertionError: None or multiple checkpoints files. MMF doesn't know what to do.
vedanuj commented 4 years ago

This issue should be fixed in rc10. Can you please reinstall the package or install from source?

xxxxxxxxy commented 4 years ago

This issue should be fixed in rc10. Can you please reinstall the package or install from source?

That checkpoint issue was fixed. But I still cannot reproduce the results.

I tried:

mmf_predict config=/data-input/vilbert_coco/save/config.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=/data-input/vilbert_coco/save/vilbert_final.pth

and

mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=/data-input/vilbert_coco/save/vilbert_final.pth

The submitted results are 0.4836 and 0.4990, respectively.

vedanuj commented 4 years ago

Try the config : projects/hateful_memes/configs/vilbert/defaults.yaml for validation and prediction. If the config has resume_pretrained: true then only the bert base is loaded and classifier weights are not. So use that config only when training.

xxxxxxxxy commented 4 years ago

Thanks. This issue has been resolved.

gireek commented 4 years ago

Secondly, You are using vilbert config for mmf_train but for prediction you are using visual_bert in your command.

Your mmf_predict command for vilbert should be :

mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth

Hi @vedanuj Does this kind of prediction work for hateful_memes as well when I trained a model and ckpt file is created? like this:

!mmf_predict config="configs/experiments/defaults.yaml" model=concat_vl dataset=hateful_memes run_type=test checkpoint.resume_file=./save/models/model_2000.ckpt

because it keeps on giving me this error(I am using on colab gpu):

2020-08-26 15:25:06.517039: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Namespace(config_override=None, local_rank=None, opts=['config=configs/experiments/defaults.yaml', 'model=concat_vl', 'dataset=hateful_memes', 'run_type=test', 'checkpoint.resume_file=./save/models/model_2000.ckpt', 'evaluation.predict=true'])
/usr/local/lib/python3.6/dist-packages/mmf/utils/configuration.py:284: UserWarning: No model named 'concat_vl' has been registered
  warnings.warn(warning)
Overriding option config to configs/experiments/defaults.yaml
Overriding option model to concat_vl
Overriding option datasets to hateful_memes
Overriding option run_type to test
Overriding option checkpoint.resume_file to ./save/models/model_2000.ckpt
Overriding option evaluation.predict to true
Using seed 8536674
Logging to: ./save/logs/train_2020-08-26T15:25:08.log
Traceback (most recent call last):
  File "/usr/local/bin/mmf_predict", line 8, in <module>
    sys.exit(predict())
  File "/usr/local/lib/python3.6/dist-packages/mmf_cli/predict.py", line 15, in predict
    run(predict=True)
  File "/usr/local/lib/python3.6/dist-packages/mmf_cli/run.py", line 111, in run
    main(configuration, predict=predict)
  File "/usr/local/lib/python3.6/dist-packages/mmf_cli/run.py", line 40, in main
    trainer.load()
  File "/usr/local/lib/python3.6/dist-packages/mmf/trainers/base_trainer.py", line 59, in load
    self.load_datasets()
  File "/usr/local/lib/python3.6/dist-packages/mmf/trainers/base_trainer.py", line 83, in load_datasets
    self.dataset_loader.load_datasets()
  File "/usr/local/lib/python3.6/dist-packages/mmf/common/dataset_loader.py", line 17, in load_datasets
    self.train_dataset.load(self.config)
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/multi_dataset_loader.py", line 114, in load
    self.build_datasets(config)
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/multi_dataset_loader.py", line 131, in build_datasets
    dataset_instance = build_dataset(dataset, dataset_config, self.dataset_type)
  File "/usr/local/lib/python3.6/dist-packages/mmf/utils/build.py", line 106, in build_dataset
    dataset = builder_instance.load_dataset(config, dataset_type)
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/base_dataset_builder.py", line 98, in load_dataset
    dataset.init_processors()
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/concat_dataset.py", line 35, in _call_all_datasets_func
    value = getattr(dataset, name)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/builders/hateful_memes/dataset.py", line 69, in init_processors
    super().init_processors()
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/base_dataset.py", line 60, in init_processors
    self.config.processors, reg_key, **extra_params
  File "/usr/local/lib/python3.6/dist-packages/mmf/utils/build.py", line 287, in build_processors
    processor_instance = Processor(processor_params, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/processors/processors.py", line 151, in __init__
    self.processor = processor_class(params, *args, **kwargs)
TypeError: 'NoneType' object is not callable