Closed xxxxxxxxy closed 4 years ago
Hi @xxxxxxxxy
In your mmf_predict
command you are not using the trained model but the coco pretrained model(that is not finetuned on the hm dataset). That may be the reason you are getting low scores. After you finish training you should get a trained model saved in your env.save_dir
or if you haven't specified it then by default ./save
directory.
You should use that model when running mmf_predict
:
mmf_predict config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/visual_bert_final.pth
Can you try this and let us know if that worked?
Secondly, You are using vilbert
config for mmf_train but for prediction you are using visual_bert
in your command.
Your mmf_predict command for vilbert
should be :
mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth
Hi @xxxxxxxxy
In your
mmf_predict
command you are not using the trained model but the coco pretrained model(that is not finetuned on the hm dataset). That may be the reason you are getting low scores. After you finish training you should get a trained model saved in yourenv.save_dir
or if you haven't specified it then by default./save
directory.You should use that model when running
mmf_predict
:mmf_predict config=projects/hateful_memes/configs/visual_bert/from_coco.yaml model=visual_bert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/visual_bert_final.pth
Can you try this and let us know if that worked?
I tried this:
mmf_predict config=projects/hateful_memes/configs/vilbert/defaults.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth
I submitted the produced results and I got 0.3715. Still cannot reproduce the result.
Secondly, You are using
vilbert
config for mmf_train but for prediction you are usingvisual_bert
in your command.Your mmf_predict command for
vilbert
should be :mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth
I paste the wrong code. It should be what you wrote.
However, this problem occurs when training and not predicting.
Here is the problem:
2020-06-15T07:43:58 INFO: Stepping into final validation check
2020-06-15T07:43:58 INFO: Evaluation time. Running on full validation set...
2020-06-15T07:44:07 INFO: progress: 22001/22000, val/total_loss: 3.4931, val/hateful_memes/cross_entropy: 3.4931, val/hateful_memes/accuracy: 0.6000, val/hateful_memes/binary_f1: 0.4595, val/hateful_memes/roc_auc: 0.6858, num_updates: 22001, epoch: 83, iterations: 22001, max_updates: 22000, val_time: 09s 187ms, best_update: 6000, best_iteration: 6000, best_val/hateful_memes/roc_auc: 0.694604
2020-06-15T07:44:08 INFO: Restoring checkpoint
2020-06-15T07:44:08 INFO: Loading checkpoint
Traceback (most recent call last):
File "/opt/conda/bin/mmf_run", line 8, in <module>
sys.exit(run())
File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 89, in run
main(configuration, predict=predict)
File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 40, in main
trainer.train()
File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 265, in train
self.finalize()
File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 311, in finalize
self.checkpoint.restore()
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 412, in restore
self._load(best_path, force=True)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 171, in _load
ckpt, should_continue = self._load_from_zoo(file)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 314, in _load_from_zoo
zoo_ckpt = load_pretrained_model(file)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 56, in load_pretrained_model
), "None or multiple checkpoints files. MMF doesn't know what to do."
AssertionError: None or multiple checkpoints files. MMF doesn't know what to do.
This issue should be fixed in rc10. Can you please reinstall the package or install from source?
This issue should be fixed in rc10. Can you please reinstall the package or install from source?
That checkpoint issue was fixed. But I still cannot reproduce the results.
I tried:
mmf_predict config=/data-input/vilbert_coco/save/config.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=/data-input/vilbert_coco/save/vilbert_final.pth
and
mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=/data-input/vilbert_coco/save/vilbert_final.pth
The submitted results are 0.4836 and 0.4990, respectively.
Try the config : projects/hateful_memes/configs/vilbert/defaults.yaml
for validation and prediction. If the config has resume_pretrained: true
then only the bert base is loaded and classifier weights are not. So use that config only when training.
Thanks. This issue has been resolved.
Secondly, You are using
vilbert
config for mmf_train but for prediction you are usingvisual_bert
in your command.Your mmf_predict command for
vilbert
should be :mmf_predict config=projects/hateful_memes/configs/vilbert/from_cc.yaml model=vilbert dataset=hateful_memes run_type=test checkpoint.resume_file=./save/vilbert_final.pth
Hi @vedanuj Does this kind of prediction work for hateful_memes as well when I trained a model and ckpt file is created? like this:
!mmf_predict config="configs/experiments/defaults.yaml" model=concat_vl dataset=hateful_memes run_type=test checkpoint.resume_file=./save/models/model_2000.ckpt
because it keeps on giving me this error(I am using on colab gpu):
2020-08-26 15:25:06.517039: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Namespace(config_override=None, local_rank=None, opts=['config=configs/experiments/defaults.yaml', 'model=concat_vl', 'dataset=hateful_memes', 'run_type=test', 'checkpoint.resume_file=./save/models/model_2000.ckpt', 'evaluation.predict=true'])
/usr/local/lib/python3.6/dist-packages/mmf/utils/configuration.py:284: UserWarning: No model named 'concat_vl' has been registered
warnings.warn(warning)
Overriding option config to configs/experiments/defaults.yaml
Overriding option model to concat_vl
Overriding option datasets to hateful_memes
Overriding option run_type to test
Overriding option checkpoint.resume_file to ./save/models/model_2000.ckpt
Overriding option evaluation.predict to true
Using seed 8536674
Logging to: ./save/logs/train_2020-08-26T15:25:08.log
Traceback (most recent call last):
File "/usr/local/bin/mmf_predict", line 8, in <module>
sys.exit(predict())
File "/usr/local/lib/python3.6/dist-packages/mmf_cli/predict.py", line 15, in predict
run(predict=True)
File "/usr/local/lib/python3.6/dist-packages/mmf_cli/run.py", line 111, in run
main(configuration, predict=predict)
File "/usr/local/lib/python3.6/dist-packages/mmf_cli/run.py", line 40, in main
trainer.load()
File "/usr/local/lib/python3.6/dist-packages/mmf/trainers/base_trainer.py", line 59, in load
self.load_datasets()
File "/usr/local/lib/python3.6/dist-packages/mmf/trainers/base_trainer.py", line 83, in load_datasets
self.dataset_loader.load_datasets()
File "/usr/local/lib/python3.6/dist-packages/mmf/common/dataset_loader.py", line 17, in load_datasets
self.train_dataset.load(self.config)
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/multi_dataset_loader.py", line 114, in load
self.build_datasets(config)
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/multi_dataset_loader.py", line 131, in build_datasets
dataset_instance = build_dataset(dataset, dataset_config, self.dataset_type)
File "/usr/local/lib/python3.6/dist-packages/mmf/utils/build.py", line 106, in build_dataset
dataset = builder_instance.load_dataset(config, dataset_type)
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/base_dataset_builder.py", line 98, in load_dataset
dataset.init_processors()
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/concat_dataset.py", line 35, in _call_all_datasets_func
value = getattr(dataset, name)(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/builders/hateful_memes/dataset.py", line 69, in init_processors
super().init_processors()
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/base_dataset.py", line 60, in init_processors
self.config.processors, reg_key, **extra_params
File "/usr/local/lib/python3.6/dist-packages/mmf/utils/build.py", line 287, in build_processors
processor_instance = Processor(processor_params, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmf/datasets/processors/processors.py", line 151, in __init__
self.processor = processor_class(params, *args, **kwargs)
TypeError: 'NoneType' object is not callable
❓ Questions and Help
I run the following instructions step by step:
The paper indicates that the result on the test dataset with ViLBERT CC is 70.03 or so, but I got 0.4647 after submitting the predictive result to the submission page.
An error occurs when training at the last step:
2020-06-05T06:43:26 INFO: Stepping into final validation check 2020-06-05T06:43:26 INFO: Evaluation time. Running on full validation set... 2020-06-05T06:43:35 INFO: progress: 22001/22000, val/total_loss: 3.5773, val/hateful_memes/cross_entropy: 3.5773, val/hateful_memes/accuracy: 0.5840, val/hateful_memes/binary_f1: 0.4317, val/hateful_memes/roc_auc: 0.6909, num_updates: 22001, epoch: 83, iterations: 22001, max_updates: 22000, val_time: 09s 123ms, best_update: 4000, best_iteration: 4000, best_val/hateful_memes/roc_auc: 0.699356 2020-06-05T06:43:36 INFO: Restoring checkpoint 2020-06-05T06:43:36 INFO: Loading checkpoint Traceback (most recent call last): File "/opt/conda/bin/mmf_run", line 8, in
sys.exit(run())
File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 89, in run
main(configuration, predict=predict)
File "/opt/conda/lib/python3.6/site-packages/mmf_cli/run.py", line 40, in main
trainer.train()
File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 265, in train
self.finalize()
File "/opt/conda/lib/python3.6/site-packages/mmf/trainers/base_trainer.py", line 311, in finalize
self.checkpoint.restore()
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 412, in restore
self._load(best_path, force=True)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 171, in _load
ckpt, should_continue = self._load_from_zoo(file)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 314, in _load_from_zoo
zoo_ckpt = load_pretrained_model(file)
File "/opt/conda/lib/python3.6/site-packages/mmf/utils/checkpoint.py", line 56, in load_pretrained_model
), "None or multiple checkpoints files. MMF doesn't know what to do."
AssertionError: None or multiple checkpoints files. MMF doesn't know what to do.