facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.48k stars 935 forks source link

Still can't reproduce Image-Grid Results from Paper #776

Closed shivgodhia closed 3 years ago

shivgodhia commented 3 years ago

So I previously had trouble reproducing the results using the pretrained models from the model zoo. Now that is fine. I moved on to trying to train the model myself and encounter problems reproducing the results using mmf.

Instructions To Reproduce the Issue:

  1. Train Image-grid

    mmf_run config=mmf/projects/hateful_memes/configs/unimodal/image.yaml model=unimodal_image dataset=hateful_memes
  2. Evaluate on validation set

    mmf_run config=mmf/projects/hateful_memes/configs/unimodal/image.yaml model=unimodal_image dataset=hateful_memes run_type=val checkpoint.resume_file=./save/unimodal_image_final.pth checkpoint.resume_pretrained=False
  3. full logs you observed:

When training, I get these warnings and a "targets not found error"

2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option config to mmf/projects/hateful_memes/configs/unimodal/image.yaml
2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option model to unimodal_image
2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option datasets to hateful_memes
2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option run_type to val
2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option checkpoint.resume_file to ./save/unimodel_image_final.pth
2021-02-14T06:46:52 | mmf.utils.configuration: Overriding option checkpoint.resume_pretrained to False
2021-02-14T06:46:52 | mmf: Logging to: ./save/train.log
2021-02-14T06:46:52 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=mmf/projects/hateful_memes/configs/unimodal/image.yaml', 'model=unimodal_image', 'dataset=hateful_memes', 'run_type=val', 'checkpoint.resume_file=./save/unimodel_image_final.pth', 'checkpoint.resume_pretrained=False'])
2021-02-14T06:46:52 | mmf_cli.run: Torch version: 1.6.0
2021-02-14T06:46:52 | mmf.utils.general: CUDA Device 0 is: Tesla P100-PCIE-16GB
2021-02-14T06:46:52 | mmf_cli.run: Using seed 52227513
2021-02-14T06:46:52 | mmf.trainers.mmf_trainer: Loading datasets
2021-02-14T06:46:53 | torchtext.vocab: Loading vectors from /home/sgg29/.cache/torch/mmf/glove.6B.300d.txt.pt
2021-02-14T06:46:54 | torchtext.vocab: Loading vectors from /home/sgg29/.cache/torch/mmf/glove.6B.300d.txt.pt
2021-02-14T06:46:58 | mmf.trainers.mmf_trainer: Loading model
2021-02-14T06:47:09 | mmf.trainers.mmf_trainer: Loading optimizer
2021-02-14T06:47:09 | mmf.trainers.mmf_trainer: Loading metrics
2021-02-14T06:46:19 | mmf.trainers.callbacks.logistics: progress: 6900/22000, train/hateful_memes/cross_entropy: 0.0091, train/hateful_memes/cross_entropy/avg: 0.2273, train/total_loss: 0.0091, train/total_loss/avg: 0.2273, max mem: 6424.0, experiment: run, epoch: 26, num_updates: 6900, iterations: 6900, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 236ms, time_since_start: 59m 22s 583ms, eta: 02h 06m 25s 700ms
2021-02-14T06:47:11 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T06:47:11 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T06:47:11 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T06:47:14 | mmf.trainers.callbacks.logistics: progress: 7000/22000, train/hateful_memes/cross_entropy: 0.0092, train/hateful_memes/cross_entropy/avg: 0.2242, train/total_loss: 0.0092, train/total_loss/avg: 0.2242, max mem: 6424.0, experiment: run, epoch: 27, num_updates: 7000, iterations: 7000, max_updates: 22000, lr: 0.00001, ups: 1.85, time: 54s 758ms, time_since_start: 01h 17s 341ms, eta: 02h 16m 53s 723ms
2021-02-14T06:47:14 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T06:47:19 | mmf.trainers.callbacks.logistics: progress: 7000/22000, val/hateful_memes/cross_entropy: 2.1549, val/total_loss: 2.1549, val/hateful_memes/accuracy: 0.4980, val/hateful_memes/binary_f1: 0.2849, val/hateful_memes/roc_auc: 0.4851, num_updates: 7000, epoch: 27, iterations: 7000, max_updates: 22000, val_time: 04s 599ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T06:48:09 | mmf.trainers.callbacks.logistics: progress: 7100/22000, train/hateful_memes/cross_entropy: 0.0091, train/hateful_memes/cross_entropy/avg: 0.2211, train/total_loss: 0.0091, train/total_loss/avg: 0.2211, max mem: 6424.0, experiment: run, epoch: 27, num_updates: 7100, iterations: 7100, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 532ms, time_since_start: 01h 01m 12s 474ms, eta: 02h 05m 29s 407ms
2021-02-14T06:49:01 | mmf.trainers.callbacks.logistics: progress: 7200/22000, train/hateful_memes/cross_entropy: 0.0092, train/hateful_memes/cross_entropy/avg: 0.2192, train/total_loss: 0.0092, train/total_loss/avg: 0.2192, max mem: 6424.0, experiment: run, epoch: 28, num_updates: 7200, iterations: 7200, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 425ms, time_since_start: 01h 02m 03s 900ms, eta: 02h 06m 51s 042ms
2021-02-14T06:49:51 | mmf.trainers.callbacks.logistics: progress: 7300/22000, train/hateful_memes/cross_entropy: 0.0091, train/hateful_memes/cross_entropy/avg: 0.2162, train/total_loss: 0.0091, train/total_loss/avg: 0.2162, max mem: 6424.0, experiment: run, epoch: 28, num_updates: 7300, iterations: 7300, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 220ms, time_since_start: 01h 02m 54s 121ms, eta: 02h 03m 02s 410ms
2021-02-14T06:50:41 | mmf.trainers.callbacks.logistics: progress: 7400/22000, train/hateful_memes/cross_entropy: 0.0072, train/hateful_memes/cross_entropy/avg: 0.2134, train/total_loss: 0.0072, train/total_loss/avg: 0.2134, max mem: 6424.0, experiment: run, epoch: 28, num_updates: 7400, iterations: 7400, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 228ms, time_since_start: 01h 03m 44s 349ms, eta: 02h 02m 13s 335ms
2021-02-14T06:51:32 | mmf.trainers.callbacks.logistics: progress: 7500/22000, train/hateful_memes/cross_entropy: 0.0072, train/hateful_memes/cross_entropy/avg: 0.2106, train/total_loss: 0.0072, train/total_loss/avg: 0.2106, max mem: 6424.0, experiment: run, epoch: 29, num_updates: 7500, iterations: 7500, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 964ms, time_since_start: 01h 04m 35s 314ms, eta: 02h 03m 09s 855ms
2021-02-14T06:52:23 | mmf.trainers.callbacks.logistics: progress: 7600/22000, train/hateful_memes/cross_entropy: 0.0091, train/hateful_memes/cross_entropy/avg: 0.2080, train/total_loss: 0.0091, train/total_loss/avg: 0.2080, max mem: 6424.0, experiment: run, epoch: 29, num_updates: 7600, iterations: 7600, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 481ms, time_since_start: 01h 05m 25s 795ms, eta: 02h 01m 09s 347ms
2021-02-14T06:53:13 | mmf.trainers.callbacks.logistics: progress: 7700/22000, train/hateful_memes/cross_entropy: 0.0092, train/hateful_memes/cross_entropy/avg: 0.2062, train/total_loss: 0.0092, train/total_loss/avg: 0.2062, max mem: 6424.0, experiment: run, epoch: 29, num_updates: 7700, iterations: 7700, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 344ms, time_since_start: 01h 06m 16s 140ms, eta: 01h 59m 59s 295ms
2021-02-14T06:54:04 | mmf.trainers.callbacks.logistics: progress: 7800/22000, train/hateful_memes/cross_entropy: 0.0092, train/hateful_memes/cross_entropy/avg: 0.2036, train/total_loss: 0.0092, train/total_loss/avg: 0.2036, max mem: 6424.0, experiment: run, epoch: 30, num_updates: 7800, iterations: 7800, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 300ms, time_since_start: 01h 07m 07s 440ms, eta: 02h 01m 24s 643ms
2021-02-14T06:54:55 | mmf.trainers.callbacks.logistics: progress: 7900/22000, train/hateful_memes/cross_entropy: 0.0061, train/hateful_memes/cross_entropy/avg: 0.2011, train/total_loss: 0.0061, train/total_loss/avg: 0.2011, max mem: 6424.0, experiment: run, epoch: 30, num_updates: 7900, iterations: 7900, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 193ms, time_since_start: 01h 07m 57s 634ms, eta: 01h 57m 57s 321ms
2021-02-14T06:55:46 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T06:55:46 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T06:55:46 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T06:55:49 | mmf.trainers.callbacks.logistics: progress: 8000/22000, train/hateful_memes/cross_entropy: 0.0061, train/hateful_memes/cross_entropy/avg: 0.1986, train/total_loss: 0.0061, train/total_loss/avg: 0.1986, max mem: 6424.0, experiment: run, epoch: 31, num_updates: 8000, iterations: 8000, max_updates: 22000, lr: 0.00001, ups: 1.85, time: 54s 643ms, time_since_start: 01h 08m 52s 277ms, eta: 02h 07m 30s 035ms
2021-02-14T06:55:49 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T06:55:54 | mmf.trainers.callbacks.logistics: progress: 8000/22000, val/hateful_memes/cross_entropy: 2.3320, val/total_loss: 2.3320, val/hateful_memes/accuracy: 0.5080, val/hateful_memes/binary_f1: 0.2807, val/hateful_memes/roc_auc: 0.4827, num_updates: 8000, epoch: 31, iterations: 8000, max_updates: 22000, val_time: 04s 801ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T06:56:44 | mmf.trainers.callbacks.logistics: progress: 8100/22000, train/hateful_memes/cross_entropy: 0.0061, train/hateful_memes/cross_entropy/avg: 0.1968, train/total_loss: 0.0061, train/total_loss/avg: 0.1968, max mem: 6424.0, experiment: run, epoch: 31, num_updates: 8100, iterations: 8100, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 532ms, time_since_start: 01h 09m 47s 612ms, eta: 01h 57m 04s 022ms
2021-02-14T06:57:35 | mmf.trainers.callbacks.logistics: progress: 8200/22000, train/hateful_memes/cross_entropy: 0.0092, train/hateful_memes/cross_entropy/avg: 0.1947, train/total_loss: 0.0092, train/total_loss/avg: 0.1947, max mem: 6424.0, experiment: run, epoch: 31, num_updates: 8200, iterations: 8200, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 236ms, time_since_start: 01h 10m 37s 849ms, eta: 01h 55m 32s 612ms
2021-02-14T06:58:26 | mmf.trainers.callbacks.logistics: progress: 8300/22000, train/hateful_memes/cross_entropy: 0.0061, train/hateful_memes/cross_entropy/avg: 0.1923, train/total_loss: 0.0061, train/total_loss/avg: 0.1923, max mem: 6424.0, experiment: run, epoch: 32, num_updates: 8300, iterations: 8300, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 056ms, time_since_start: 01h 11m 28s 905ms, eta: 01h 56m 34s 712ms
2021-02-14T06:59:16 | mmf.trainers.callbacks.logistics: progress: 8400/22000, train/hateful_memes/cross_entropy: 0.0054, train/hateful_memes/cross_entropy/avg: 0.1901, train/total_loss: 0.0054, train/total_loss/avg: 0.1901, max mem: 6424.0, experiment: run, epoch: 32, num_updates: 8400, iterations: 8400, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 194ms, time_since_start: 01h 12m 19s 099ms, eta: 01h 53m 46s 454ms
2021-02-14T07:00:06 | mmf.trainers.callbacks.logistics: progress: 8500/22000, train/hateful_memes/cross_entropy: 0.0045, train/hateful_memes/cross_entropy/avg: 0.1879, train/total_loss: 0.0045, train/total_loss/avg: 0.1879, max mem: 6424.0, experiment: run, epoch: 32, num_updates: 8500, iterations: 8500, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 180ms, time_since_start: 01h 13m 09s 280ms, eta: 01h 52m 54s 364ms
2021-02-14T07:00:57 | mmf.trainers.callbacks.logistics: progress: 8600/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1857, train/total_loss: 0.0041, train/total_loss/avg: 0.1857, max mem: 6424.0, experiment: run, epoch: 33, num_updates: 8600, iterations: 8600, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 276ms, time_since_start: 01h 14m 556ms, eta: 01h 54m 31s 055ms
2021-02-14T07:01:48 | mmf.trainers.callbacks.logistics: progress: 8700/22000, train/hateful_memes/cross_entropy: 0.0045, train/hateful_memes/cross_entropy/avg: 0.1837, train/total_loss: 0.0045, train/total_loss/avg: 0.1837, max mem: 6424.0, experiment: run, epoch: 33, num_updates: 8700, iterations: 8700, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 180ms, time_since_start: 01h 14m 50s 737ms, eta: 01h 51m 14s 007ms
2021-02-14T07:02:39 | mmf.trainers.callbacks.logistics: progress: 8800/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1816, train/total_loss: 0.0041, train/total_loss/avg: 0.1816, max mem: 6424.0, experiment: run, epoch: 34, num_updates: 8800, iterations: 8800, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 230ms, time_since_start: 01h 15m 41s 968ms, eta: 01h 52m 42s 455ms
2021-02-14T07:03:29 | mmf.trainers.callbacks.logistics: progress: 8900/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1796, train/total_loss: 0.0041, train/total_loss/avg: 0.1796, max mem: 6424.0, experiment: run, epoch: 34, num_updates: 8900, iterations: 8900, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 159ms, time_since_start: 01h 16m 32s 127ms, eta: 01h 49m 30s 860ms
2021-02-14T07:04:19 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:04:19 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:04:19 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:04:22 | mmf.trainers.callbacks.logistics: progress: 9000/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1778, train/total_loss: 0.0041, train/total_loss/avg: 0.1778, max mem: 6424.0, experiment: run, epoch: 34, num_updates: 9000, iterations: 9000, max_updates: 22000, lr: 0.00001, ups: 1.89, time: 53s 368ms, time_since_start: 01h 17m 25s 496ms, eta: 01h 55m 37s 966ms
2021-02-14T07:04:22 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:04:27 | mmf.trainers.callbacks.logistics: progress: 9000/22000, val/hateful_memes/cross_entropy: 2.3766, val/total_loss: 2.3766, val/hateful_memes/accuracy: 0.4940, val/hateful_memes/binary_f1: 0.3181, val/hateful_memes/roc_auc: 0.4896, num_updates: 9000, epoch: 34, iterations: 9000, max_updates: 22000, val_time: 04s 726ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:05:18 | mmf.trainers.callbacks.logistics: progress: 9100/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1760, train/total_loss: 0.0041, train/total_loss/avg: 0.1760, max mem: 6424.0, experiment: run, epoch: 35, num_updates: 9100, iterations: 9100, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 339ms, time_since_start: 01h 18m 21s 563ms, eta: 01h 50m 22s 816ms
2021-02-14T07:06:09 | mmf.trainers.callbacks.logistics: progress: 9200/22000, train/hateful_memes/cross_entropy: 0.0041, train/hateful_memes/cross_entropy/avg: 0.1741, train/total_loss: 0.0041, train/total_loss/avg: 0.1741, max mem: 6424.0, experiment: run, epoch: 35, num_updates: 9200, iterations: 9200, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 176ms, time_since_start: 01h 19m 11s 740ms, eta: 01h 47m 02s 570ms
2021-02-14T07:06:59 | mmf.trainers.callbacks.logistics: progress: 9300/22000, train/hateful_memes/cross_entropy: 0.0026, train/hateful_memes/cross_entropy/avg: 0.1723, train/total_loss: 0.0026, train/total_loss/avg: 0.1723, max mem: 6424.0, experiment: run, epoch: 35, num_updates: 9300, iterations: 9300, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 168ms, time_since_start: 01h 20m 01s 908ms, eta: 01h 46m 11s 423ms
2021-02-14T07:07:50 | mmf.trainers.callbacks.logistics: progress: 9400/22000, train/hateful_memes/cross_entropy: 0.0026, train/hateful_memes/cross_entropy/avg: 0.1705, train/total_loss: 0.0026, train/total_loss/avg: 0.1705, max mem: 6424.0, experiment: run, epoch: 36, num_updates: 9400, iterations: 9400, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 184ms, time_since_start: 01h 20m 53s 093ms, eta: 01h 47m 29s 296ms
2021-02-14T07:08:40 | mmf.trainers.callbacks.logistics: progress: 9500/22000, train/hateful_memes/cross_entropy: 0.0025, train/hateful_memes/cross_entropy/avg: 0.1687, train/total_loss: 0.0025, train/total_loss/avg: 0.1687, max mem: 6424.0, experiment: run, epoch: 36, num_updates: 9500, iterations: 9500, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 194ms, time_since_start: 01h 21m 43s 287ms, eta: 01h 44m 34s 265ms
2021-02-14T07:09:31 | mmf.trainers.callbacks.logistics: progress: 9600/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1669, train/total_loss: 0.0024, train/total_loss/avg: 0.1669, max mem: 6424.0, experiment: run, epoch: 37, num_updates: 9600, iterations: 9600, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 223ms, time_since_start: 01h 22m 34s 511ms, eta: 01h 45m 51s 724ms
2021-02-14T07:10:22 | mmf.trainers.callbacks.logistics: progress: 9700/22000, train/hateful_memes/cross_entropy: 0.0020, train/hateful_memes/cross_entropy/avg: 0.1652, train/total_loss: 0.0020, train/total_loss/avg: 0.1652, max mem: 6424.0, experiment: run, epoch: 37, num_updates: 9700, iterations: 9700, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 138ms, time_since_start: 01h 23m 24s 649ms, eta: 01h 42m 47s 023ms
2021-02-14T07:11:12 | mmf.trainers.callbacks.logistics: progress: 9800/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1637, train/total_loss: 0.0024, train/total_loss/avg: 0.1637, max mem: 6424.0, experiment: run, epoch: 37, num_updates: 9800, iterations: 9800, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 138ms, time_since_start: 01h 24m 14s 788ms, eta: 01h 41m 56s 860ms
2021-02-14T07:12:03 | mmf.trainers.callbacks.logistics: progress: 9900/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1623, train/total_loss: 0.0024, train/total_loss/avg: 0.1623, max mem: 6424.0, experiment: run, epoch: 38, num_updates: 9900, iterations: 9900, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 136ms, time_since_start: 01h 25m 05s 924ms, eta: 01h 43m 07s 558ms
2021-02-14T07:12:53 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:12:53 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:12:53 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:12:56 | mmf.trainers.callbacks.logistics: progress: 10000/22000, train/hateful_memes/cross_entropy: 0.0025, train/hateful_memes/cross_entropy/avg: 0.1607, train/total_loss: 0.0025, train/total_loss/avg: 0.1607, max mem: 6424.0, experiment: run, epoch: 38, num_updates: 10000, iterations: 10000, max_updates: 22000, lr: 0.00001, ups: 1.89, time: 53s 551ms, time_since_start: 01h 25m 59s 476ms, eta: 01h 47m 06s 188ms
2021-02-14T07:12:56 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:13:01 | mmf.trainers.callbacks.logistics: progress: 10000/22000, val/hateful_memes/cross_entropy: 2.6246, val/total_loss: 2.6246, val/hateful_memes/accuracy: 0.5000, val/hateful_memes/binary_f1: 0.2284, val/hateful_memes/roc_auc: 0.4990, num_updates: 10000, epoch: 38, iterations: 10000, max_updates: 22000, val_time: 04s 763ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:13:52 | mmf.trainers.callbacks.logistics: progress: 10100/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1591, train/total_loss: 0.0024, train/total_loss/avg: 0.1591, max mem: 6424.0, experiment: run, epoch: 38, num_updates: 10100, iterations: 10100, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 480ms, time_since_start: 01h 26m 54s 720ms, eta: 01h 40m 07s 139ms
2021-02-14T07:14:43 | mmf.trainers.callbacks.logistics: progress: 10200/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1576, train/total_loss: 0.0024, train/total_loss/avg: 0.1576, max mem: 6424.0, experiment: run, epoch: 39, num_updates: 10200, iterations: 10200, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 202ms, time_since_start: 01h 27m 45s 923ms, eta: 01h 40m 41s 909ms
2021-02-14T07:15:33 | mmf.trainers.callbacks.logistics: progress: 10300/22000, train/hateful_memes/cross_entropy: 0.0025, train/hateful_memes/cross_entropy/avg: 0.1562, train/total_loss: 0.0025, train/total_loss/avg: 0.1562, max mem: 6424.0, experiment: run, epoch: 39, num_updates: 10300, iterations: 10300, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 140ms, time_since_start: 01h 28m 36s 064ms, eta: 01h 37m 46s 476ms
2021-02-14T07:16:24 | mmf.trainers.callbacks.logistics: progress: 10400/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1547, train/total_loss: 0.0024, train/total_loss/avg: 0.1547, max mem: 6424.0, experiment: run, epoch: 40, num_updates: 10400, iterations: 10400, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 082ms, time_since_start: 01h 29m 27s 146ms, eta: 01h 38m 45s 559ms
2021-02-14T07:17:14 | mmf.trainers.callbacks.logistics: progress: 10500/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1532, train/total_loss: 0.0024, train/total_loss/avg: 0.1532, max mem: 6424.0, experiment: run, epoch: 40, num_updates: 10500, iterations: 10500, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 173ms, time_since_start: 01h 30m 17s 319ms, eta: 01h 36m 09s 914ms
2021-02-14T07:18:04 | mmf.trainers.callbacks.logistics: progress: 10600/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1518, train/total_loss: 0.0024, train/total_loss/avg: 0.1518, max mem: 6424.0, experiment: run, epoch: 40, num_updates: 10600, iterations: 10600, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 169ms, time_since_start: 01h 31m 07s 489ms, eta: 01h 35m 19s 351ms
2021-02-14T07:18:55 | mmf.trainers.callbacks.logistics: progress: 10700/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1504, train/total_loss: 0.0024, train/total_loss/avg: 0.1504, max mem: 6424.0, experiment: run, epoch: 41, num_updates: 10700, iterations: 10700, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 036ms, time_since_start: 01h 31m 58s 526ms, eta: 01h 36m 07s 155ms
2021-02-14T07:19:46 | mmf.trainers.callbacks.logistics: progress: 10800/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1490, train/total_loss: 0.0024, train/total_loss/avg: 0.1490, max mem: 6424.0, experiment: run, epoch: 41, num_updates: 10800, iterations: 10800, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 153ms, time_since_start: 01h 32m 48s 680ms, eta: 01h 33m 37s 200ms
2021-02-14T07:20:36 | mmf.trainers.callbacks.logistics: progress: 10900/22000, train/hateful_memes/cross_entropy: 0.0024, train/hateful_memes/cross_entropy/avg: 0.1477, train/total_loss: 0.0024, train/total_loss/avg: 0.1477, max mem: 6424.0, experiment: run, epoch: 41, num_updates: 10900, iterations: 10900, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 158ms, time_since_start: 01h 33m 38s 838ms, eta: 01h 32m 47s 589ms
2021-02-14T07:21:27 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:21:27 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:21:27 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:21:30 | mmf.trainers.callbacks.logistics: progress: 11000/22000, train/hateful_memes/cross_entropy: 0.0021, train/hateful_memes/cross_entropy/avg: 0.1463, train/total_loss: 0.0021, train/total_loss/avg: 0.1463, max mem: 6424.0, experiment: run, epoch: 42, num_updates: 11000, iterations: 11000, max_updates: 22000, lr: 0.00001, ups: 1.85, time: 54s 243ms, time_since_start: 01h 34m 33s 081ms, eta: 01h 39m 26s 770ms
2021-02-14T07:21:30 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:21:35 | mmf.trainers.callbacks.logistics: progress: 11000/22000, val/hateful_memes/cross_entropy: 2.3961, val/total_loss: 2.3961, val/hateful_memes/accuracy: 0.5100, val/hateful_memes/binary_f1: 0.3432, val/hateful_memes/roc_auc: 0.4832, num_updates: 11000, epoch: 42, iterations: 11000, max_updates: 22000, val_time: 04s 993ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:22:25 | mmf.trainers.callbacks.logistics: progress: 11100/22000, train/hateful_memes/cross_entropy: 0.0021, train/hateful_memes/cross_entropy/avg: 0.1451, train/total_loss: 0.0021, train/total_loss/avg: 0.1451, max mem: 6424.0, experiment: run, epoch: 42, num_updates: 11100, iterations: 11100, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 426ms, time_since_start: 01h 35m 28s 503ms, eta: 01h 31m 36s 536ms
2021-02-14T07:23:16 | mmf.trainers.callbacks.logistics: progress: 11200/22000, train/hateful_memes/cross_entropy: 0.0020, train/hateful_memes/cross_entropy/avg: 0.1438, train/total_loss: 0.0020, train/total_loss/avg: 0.1438, max mem: 6424.0, experiment: run, epoch: 43, num_updates: 11200, iterations: 11200, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 102ms, time_since_start: 01h 36m 19s 605ms, eta: 01h 31m 59s 064ms
2021-02-14T07:24:07 | mmf.trainers.callbacks.logistics: progress: 11300/22000, train/hateful_memes/cross_entropy: 0.0019, train/hateful_memes/cross_entropy/avg: 0.1425, train/total_loss: 0.0019, train/total_loss/avg: 0.1425, max mem: 6424.0, experiment: run, epoch: 43, num_updates: 11300, iterations: 11300, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 152ms, time_since_start: 01h 37m 09s 758ms, eta: 01h 29m 26s 289ms
2021-02-14T07:24:57 | mmf.trainers.callbacks.logistics: progress: 11400/22000, train/hateful_memes/cross_entropy: 0.0019, train/hateful_memes/cross_entropy/avg: 0.1413, train/total_loss: 0.0019, train/total_loss/avg: 0.1413, max mem: 6424.0, experiment: run, epoch: 43, num_updates: 11400, iterations: 11400, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 194ms, time_since_start: 01h 37m 59s 952ms, eta: 01h 28m 40s 638ms
2021-02-14T07:25:48 | mmf.trainers.callbacks.logistics: progress: 11500/22000, train/hateful_memes/cross_entropy: 0.0019, train/hateful_memes/cross_entropy/avg: 0.1401, train/total_loss: 0.0019, train/total_loss/avg: 0.1401, max mem: 6424.0, experiment: run, epoch: 44, num_updates: 11500, iterations: 11500, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 994ms, time_since_start: 01h 38m 50s 947ms, eta: 01h 29m 14s 384ms
2021-02-14T07:26:38 | mmf.trainers.callbacks.logistics: progress: 11600/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1389, train/total_loss: 0.0017, train/total_loss/avg: 0.1389, max mem: 6424.0, experiment: run, epoch: 44, num_updates: 11600, iterations: 11600, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 163ms, time_since_start: 01h 39m 41s 111ms, eta: 01h 26m 57s 052ms
2021-02-14T07:27:28 | mmf.trainers.callbacks.logistics: progress: 11700/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1377, train/total_loss: 0.0017, train/total_loss/avg: 0.1377, max mem: 6424.0, experiment: run, epoch: 44, num_updates: 11700, iterations: 11700, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 147ms, time_since_start: 01h 40m 31s 258ms, eta: 01h 26m 05s 211ms
2021-02-14T07:28:19 | mmf.trainers.callbacks.logistics: progress: 11800/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1367, train/total_loss: 0.0017, train/total_loss/avg: 0.1367, max mem: 6424.0, experiment: run, epoch: 45, num_updates: 11800, iterations: 11800, max_updates: 22000, lr: 0.00001, ups: 1.96, time: 51s 037ms, time_since_start: 01h 41m 22s 296ms, eta: 01h 26m 45s 843ms
2021-02-14T07:29:09 | mmf.trainers.callbacks.logistics: progress: 11900/22000, train/hateful_memes/cross_entropy: 0.0016, train/hateful_memes/cross_entropy/avg: 0.1355, train/total_loss: 0.0016, train/total_loss/avg: 0.1355, max mem: 6424.0, experiment: run, epoch: 45, num_updates: 11900, iterations: 11900, max_updates: 22000, lr: 0.00001, ups: 2.00, time: 50s 172ms, time_since_start: 01h 42m 12s 468ms, eta: 01h 24m 27s 380ms
2021-02-14T07:30:00 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:30:00 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:30:00 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:30:04 | mmf.trainers.callbacks.logistics: progress: 12000/22000, train/hateful_memes/cross_entropy: 0.0016, train/hateful_memes/cross_entropy/avg: 0.1345, train/total_loss: 0.0016, train/total_loss/avg: 0.1345, max mem: 6424.0, experiment: run, epoch: 46, num_updates: 12000, iterations: 12000, max_updates: 22000, lr: 0.00001, ups: 1.85, time: 54s 513ms, time_since_start: 01h 43m 06s 981ms, eta: 01h 30m 51s 322ms
2021-02-14T07:30:04 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:30:09 | mmf.trainers.callbacks.logistics: progress: 12000/22000, val/hateful_memes/cross_entropy: 2.6006, val/total_loss: 2.6006, val/hateful_memes/accuracy: 0.5040, val/hateful_memes/binary_f1: 0.2663, val/hateful_memes/roc_auc: 0.4922, num_updates: 12000, epoch: 46, iterations: 12000, max_updates: 22000, val_time: 04s 653ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:30:59 | mmf.trainers.callbacks.logistics: progress: 12100/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1334, train/total_loss: 0.0013, train/total_loss/avg: 0.1334, max mem: 6424.0, experiment: run, epoch: 46, num_updates: 12100, iterations: 12100, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 482ms, time_since_start: 01h 44m 02s 118ms, eta: 01h 23m 17s 749ms
2021-02-14T07:31:49 | mmf.trainers.callbacks.logistics: progress: 12200/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1323, train/total_loss: 0.0013, train/total_loss/avg: 0.1323, max mem: 6424.0, experiment: run, epoch: 46, num_updates: 12200, iterations: 12200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 179ms, time_since_start: 01h 44m 52s 298ms, eta: 01h 21m 57s 616ms
2021-02-14T07:32:40 | mmf.trainers.callbacks.logistics: progress: 12300/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1312, train/total_loss: 0.0012, train/total_loss/avg: 0.1312, max mem: 6424.0, experiment: run, epoch: 47, num_updates: 12300, iterations: 12300, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 181ms, time_since_start: 01h 45m 43s 480ms, eta: 01h 22m 44s 651ms
2021-02-14T07:33:31 | mmf.trainers.callbacks.logistics: progress: 12400/22000, train/hateful_memes/cross_entropy: 0.0011, train/hateful_memes/cross_entropy/avg: 0.1301, train/total_loss: 0.0011, train/total_loss/avg: 0.1301, max mem: 6424.0, experiment: run, epoch: 47, num_updates: 12400, iterations: 12400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 172ms, time_since_start: 01h 46m 33s 652ms, eta: 01h 20m 16s 516ms
2021-02-14T07:34:21 | mmf.trainers.callbacks.logistics: progress: 12500/22000, train/hateful_memes/cross_entropy: 0.0011, train/hateful_memes/cross_entropy/avg: 0.1291, train/total_loss: 0.0011, train/total_loss/avg: 0.1291, max mem: 6424.0, experiment: run, epoch: 47, num_updates: 12500, iterations: 12500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 173ms, time_since_start: 01h 47m 23s 825ms, eta: 01h 19m 26s 458ms
2021-02-14T07:35:12 | mmf.trainers.callbacks.logistics: progress: 12600/22000, train/hateful_memes/cross_entropy: 0.0009, train/hateful_memes/cross_entropy/avg: 0.1281, train/total_loss: 0.0009, train/total_loss/avg: 0.1281, max mem: 6424.0, experiment: run, epoch: 48, num_updates: 12600, iterations: 12600, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 069ms, time_since_start: 01h 48m 14s 894ms, eta: 01h 20m 501ms
2021-02-14T07:36:02 | mmf.trainers.callbacks.logistics: progress: 12700/22000, train/hateful_memes/cross_entropy: 0.0009, train/hateful_memes/cross_entropy/avg: 0.1271, train/total_loss: 0.0009, train/total_loss/avg: 0.1271, max mem: 6424.0, experiment: run, epoch: 48, num_updates: 12700, iterations: 12700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 185ms, time_since_start: 01h 49m 05s 080ms, eta: 01h 17m 47s 234ms
2021-02-14T07:36:53 | mmf.trainers.callbacks.logistics: progress: 12800/22000, train/hateful_memes/cross_entropy: 0.0009, train/hateful_memes/cross_entropy/avg: 0.1261, train/total_loss: 0.0009, train/total_loss/avg: 0.1261, max mem: 6424.0, experiment: run, epoch: 49, num_updates: 12800, iterations: 12800, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 165ms, time_since_start: 01h 49m 56s 245ms, eta: 01h 18m 27s 185ms
2021-02-14T07:37:43 | mmf.trainers.callbacks.logistics: progress: 12900/22000, train/hateful_memes/cross_entropy: 0.0009, train/hateful_memes/cross_entropy/avg: 0.1252, train/total_loss: 0.0009, train/total_loss/avg: 0.1252, max mem: 6424.0, experiment: run, epoch: 49, num_updates: 12900, iterations: 12900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 177ms, time_since_start: 01h 50m 46s 422ms, eta: 01h 16m 06s 163ms
2021-02-14T07:38:33 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:38:33 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:38:33 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:38:37 | mmf.trainers.callbacks.logistics: progress: 13000/22000, train/hateful_memes/cross_entropy: 0.0011, train/hateful_memes/cross_entropy/avg: 0.1244, train/total_loss: 0.0011, train/total_loss/avg: 0.1244, max mem: 6424.0, experiment: run, epoch: 49, num_updates: 13000, iterations: 13000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 656ms, time_since_start: 01h 51m 40s 079ms, eta: 01h 20m 29s 112ms
2021-02-14T07:38:37 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:38:42 | mmf.trainers.callbacks.logistics: progress: 13000/22000, val/hateful_memes/cross_entropy: 2.4842, val/total_loss: 2.4842, val/hateful_memes/accuracy: 0.5200, val/hateful_memes/binary_f1: 0.3651, val/hateful_memes/roc_auc: 0.4948, num_updates: 13000, epoch: 49, iterations: 13000, max_updates: 22000, val_time: 04s 701ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:39:33 | mmf.trainers.callbacks.logistics: progress: 13100/22000, train/hateful_memes/cross_entropy: 0.0011, train/hateful_memes/cross_entropy/avg: 0.1235, train/total_loss: 0.0011, train/total_loss/avg: 0.1235, max mem: 6424.0, experiment: run, epoch: 50, num_updates: 13100, iterations: 13100, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 462ms, time_since_start: 01h 52m 36s 244ms, eta: 01h 16m 20s 178ms
2021-02-14T07:40:23 | mmf.trainers.callbacks.logistics: progress: 13200/22000, train/hateful_memes/cross_entropy: 0.0009, train/hateful_memes/cross_entropy/avg: 0.1226, train/total_loss: 0.0009, train/total_loss/avg: 0.1226, max mem: 6424.0, experiment: run, epoch: 50, num_updates: 13200, iterations: 13200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 149ms, time_since_start: 01h 53m 26s 394ms, eta: 01h 13m 33s 157ms
2021-02-14T07:41:13 | mmf.trainers.callbacks.logistics: progress: 13300/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1221, train/total_loss: 0.0013, train/total_loss/avg: 0.1221, max mem: 6424.0, experiment: run, epoch: 50, num_updates: 13300, iterations: 13300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 060ms, time_since_start: 01h 54m 16s 454ms, eta: 01h 12m 35s 226ms
2021-02-14T07:42:05 | mmf.trainers.callbacks.logistics: progress: 13400/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1212, train/total_loss: 0.0013, train/total_loss/avg: 0.1212, max mem: 6424.0, experiment: run, epoch: 51, num_updates: 13400, iterations: 13400, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 209ms, time_since_start: 01h 55m 07s 664ms, eta: 01h 13m 24s 057ms
2021-02-14T07:42:55 | mmf.trainers.callbacks.logistics: progress: 13500/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1203, train/total_loss: 0.0013, train/total_loss/avg: 0.1203, max mem: 6424.0, experiment: run, epoch: 51, num_updates: 13500, iterations: 13500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 167ms, time_since_start: 01h 55m 57s 831ms, eta: 01h 11m 04s 254ms
2021-02-14T07:43:46 | mmf.trainers.callbacks.logistics: progress: 13600/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1194, train/total_loss: 0.0012, train/total_loss/avg: 0.1194, max mem: 6424.0, experiment: run, epoch: 52, num_updates: 13600, iterations: 13600, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 129ms, time_since_start: 01h 56m 48s 961ms, eta: 01h 11m 34s 907ms
2021-02-14T07:44:36 | mmf.trainers.callbacks.logistics: progress: 13700/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1185, train/total_loss: 0.0012, train/total_loss/avg: 0.1185, max mem: 6424.0, experiment: run, epoch: 52, num_updates: 13700, iterations: 13700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 191ms, time_since_start: 01h 57m 39s 152ms, eta: 01h 09m 25s 875ms
2021-02-14T07:45:26 | mmf.trainers.callbacks.logistics: progress: 13800/22000, train/hateful_memes/cross_entropy: 0.0007, train/hateful_memes/cross_entropy/avg: 0.1177, train/total_loss: 0.0007, train/total_loss/avg: 0.1177, max mem: 6424.0, experiment: run, epoch: 52, num_updates: 13800, iterations: 13800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 184ms, time_since_start: 01h 58m 29s 337ms, eta: 01h 08m 35s 121ms
2021-02-14T07:46:17 | mmf.trainers.callbacks.logistics: progress: 13900/22000, train/hateful_memes/cross_entropy: 0.0007, train/hateful_memes/cross_entropy/avg: 0.1168, train/total_loss: 0.0007, train/total_loss/avg: 0.1168, max mem: 6424.0, experiment: run, epoch: 53, num_updates: 13900, iterations: 13900, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 122ms, time_since_start: 01h 59m 20s 459ms, eta: 01h 09m 928ms
2021-02-14T07:47:07 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:47:07 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:47:07 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:47:11 | mmf.trainers.callbacks.logistics: progress: 14000/22000, train/hateful_memes/cross_entropy: 0.0007, train/hateful_memes/cross_entropy/avg: 0.1161, train/total_loss: 0.0007, train/total_loss/avg: 0.1161, max mem: 6424.0, experiment: run, epoch: 53, num_updates: 14000, iterations: 14000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 432ms, time_since_start: 02h 13s 892ms, eta: 01h 11m 14s 640ms
2021-02-14T07:47:11 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:47:16 | mmf.trainers.callbacks.logistics: progress: 14000/22000, val/hateful_memes/cross_entropy: 2.6351, val/total_loss: 2.6351, val/hateful_memes/accuracy: 0.5160, val/hateful_memes/binary_f1: 0.3240, val/hateful_memes/roc_auc: 0.4956, num_updates: 14000, epoch: 53, iterations: 14000, max_updates: 22000, val_time: 04s 822ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:48:07 | mmf.trainers.callbacks.logistics: progress: 14100/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1154, train/total_loss: 0.0012, train/total_loss/avg: 0.1154, max mem: 6424.0, experiment: run, epoch: 54, num_updates: 14100, iterations: 14100, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 365ms, time_since_start: 02h 01m 10s 081ms, eta: 01h 07m 37s 892ms
2021-02-14T07:48:57 | mmf.trainers.callbacks.logistics: progress: 14200/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1146, train/total_loss: 0.0012, train/total_loss/avg: 0.1146, max mem: 6424.0, experiment: run, epoch: 54, num_updates: 14200, iterations: 14200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 178ms, time_since_start: 02h 02m 260ms, eta: 01h 05m 13s 944ms
2021-02-14T07:49:47 | mmf.trainers.callbacks.logistics: progress: 14300/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1138, train/total_loss: 0.0013, train/total_loss/avg: 0.1138, max mem: 6424.0, experiment: run, epoch: 54, num_updates: 14300, iterations: 14300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 175ms, time_since_start: 02h 02m 50s 435ms, eta: 01h 04m 23s 483ms
2021-02-14T07:50:39 | mmf.trainers.callbacks.logistics: progress: 14400/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1131, train/total_loss: 0.0017, train/total_loss/avg: 0.1131, max mem: 6424.0, experiment: run, epoch: 55, num_updates: 14400, iterations: 14400, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 441ms, time_since_start: 02h 03m 41s 877ms, eta: 01h 05m 09s 532ms
2021-02-14T07:51:29 | mmf.trainers.callbacks.logistics: progress: 14500/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1123, train/total_loss: 0.0017, train/total_loss/avg: 0.1123, max mem: 6424.0, experiment: run, epoch: 55, num_updates: 14500, iterations: 14500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 184ms, time_since_start: 02h 04m 32s 061ms, eta: 01h 02m 43s 825ms
2021-02-14T07:52:19 | mmf.trainers.callbacks.logistics: progress: 14600/22000, train/hateful_memes/cross_entropy: 0.0017, train/hateful_memes/cross_entropy/avg: 0.1115, train/total_loss: 0.0017, train/total_loss/avg: 0.1115, max mem: 6424.0, experiment: run, epoch: 55, num_updates: 14600, iterations: 14600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 175ms, time_since_start: 02h 05m 22s 236ms, eta: 01h 01m 52s 984ms
2021-02-14T07:53:11 | mmf.trainers.callbacks.logistics: progress: 14700/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1108, train/total_loss: 0.0013, train/total_loss/avg: 0.1108, max mem: 6424.0, experiment: run, epoch: 56, num_updates: 14700, iterations: 14700, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 476ms, time_since_start: 02h 06m 13s 713ms, eta: 01h 02m 37s 808ms
2021-02-14T07:54:01 | mmf.trainers.callbacks.logistics: progress: 14800/22000, train/hateful_memes/cross_entropy: 0.0013, train/hateful_memes/cross_entropy/avg: 0.1100, train/total_loss: 0.0013, train/total_loss/avg: 0.1100, max mem: 6424.0, experiment: run, epoch: 56, num_updates: 14800, iterations: 14800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 170ms, time_since_start: 02h 07m 03s 884ms, eta: 01h 12s 270ms
2021-02-14T07:54:52 | mmf.trainers.callbacks.logistics: progress: 14900/22000, train/hateful_memes/cross_entropy: 0.0012, train/hateful_memes/cross_entropy/avg: 0.1093, train/total_loss: 0.0012, train/total_loss/avg: 0.1093, max mem: 6424.0, experiment: run, epoch: 57, num_updates: 14900, iterations: 14900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 930ms, time_since_start: 02h 07m 54s 814ms, eta: 01h 16s 055ms
2021-02-14T07:55:42 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T07:55:42 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T07:55:42 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T07:55:45 | mmf.trainers.callbacks.logistics: progress: 15000/22000, train/hateful_memes/cross_entropy: 0.0010, train/hateful_memes/cross_entropy/avg: 0.1086, train/total_loss: 0.0010, train/total_loss/avg: 0.1086, max mem: 6424.0, experiment: run, epoch: 57, num_updates: 15000, iterations: 15000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 500ms, time_since_start: 02h 08m 48s 315ms, eta: 01h 02m 25s 026ms
2021-02-14T07:55:45 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T07:55:50 | mmf.trainers.callbacks.logistics: progress: 15000/22000, val/hateful_memes/cross_entropy: 2.6957, val/total_loss: 2.6957, val/hateful_memes/accuracy: 0.5100, val/hateful_memes/binary_f1: 0.3020, val/hateful_memes/roc_auc: 0.4942, num_updates: 15000, epoch: 57, iterations: 15000, max_updates: 22000, val_time: 04s 763ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T07:56:41 | mmf.trainers.callbacks.logistics: progress: 15100/22000, train/hateful_memes/cross_entropy: 0.0010, train/hateful_memes/cross_entropy/avg: 0.1079, train/total_loss: 0.0010, train/total_loss/avg: 0.1079, max mem: 6424.0, experiment: run, epoch: 57, num_updates: 15100, iterations: 15100, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 552ms, time_since_start: 02h 09m 43s 632ms, eta: 58m 08s 136ms
2021-02-14T07:57:32 | mmf.trainers.callbacks.logistics: progress: 15200/22000, train/hateful_memes/cross_entropy: 0.0010, train/hateful_memes/cross_entropy/avg: 0.1072, train/total_loss: 0.0010, train/total_loss/avg: 0.1072, max mem: 6424.0, experiment: run, epoch: 58, num_updates: 15200, iterations: 15200, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 112ms, time_since_start: 02h 10m 34s 744ms, eta: 57m 55s 674ms
2021-02-14T07:58:22 | mmf.trainers.callbacks.logistics: progress: 15300/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1064, train/total_loss: 0.0008, train/total_loss/avg: 0.1064, max mem: 6424.0, experiment: run, epoch: 58, num_updates: 15300, iterations: 15300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 160ms, time_since_start: 02h 11m 24s 905ms, eta: 56m 745ms
2021-02-14T07:59:12 | mmf.trainers.callbacks.logistics: progress: 15400/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1058, train/total_loss: 0.0008, train/total_loss/avg: 0.1058, max mem: 6424.0, experiment: run, epoch: 58, num_updates: 15400, iterations: 15400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 156ms, time_since_start: 02h 12m 15s 061ms, eta: 55m 10s 333ms
2021-02-14T08:00:03 | mmf.trainers.callbacks.logistics: progress: 15500/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1051, train/total_loss: 0.0008, train/total_loss/avg: 0.1051, max mem: 6424.0, experiment: run, epoch: 59, num_updates: 15500, iterations: 15500, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 038ms, time_since_start: 02h 13m 06s 100ms, eta: 55m 17s 507ms
2021-02-14T08:00:53 | mmf.trainers.callbacks.logistics: progress: 15600/22000, train/hateful_memes/cross_entropy: 0.0006, train/hateful_memes/cross_entropy/avg: 0.1044, train/total_loss: 0.0006, train/total_loss/avg: 0.1044, max mem: 6424.0, experiment: run, epoch: 59, num_updates: 15600, iterations: 15600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 137ms, time_since_start: 02h 13m 56s 237ms, eta: 53m 28s 793ms
2021-02-14T08:01:45 | mmf.trainers.callbacks.logistics: progress: 15700/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1038, train/total_loss: 0.0008, train/total_loss/avg: 0.1038, max mem: 6424.0, experiment: run, epoch: 60, num_updates: 15700, iterations: 15700, max_updates: 22000, lr: 0., ups: 1.92, time: 52s 308ms, time_since_start: 02h 14m 48s 546ms, eta: 54m 55s 428ms
2021-02-14T08:02:36 | mmf.trainers.callbacks.logistics: progress: 15800/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1034, train/total_loss: 0.0008, train/total_loss/avg: 0.1034, max mem: 6424.0, experiment: run, epoch: 60, num_updates: 15800, iterations: 15800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 189ms, time_since_start: 02h 15m 38s 736ms, eta: 51m 51s 770ms
2021-02-14T08:03:26 | mmf.trainers.callbacks.logistics: progress: 15900/22000, train/hateful_memes/cross_entropy: 0.0010, train/hateful_memes/cross_entropy/avg: 0.1028, train/total_loss: 0.0010, train/total_loss/avg: 0.1028, max mem: 6424.0, experiment: run, epoch: 60, num_updates: 15900, iterations: 15900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 185ms, time_since_start: 02h 16m 28s 921ms, eta: 51m 01s 321ms
2021-02-14T08:04:17 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:04:17 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:04:17 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:04:21 | mmf.trainers.callbacks.logistics: progress: 16000/22000, train/hateful_memes/cross_entropy: 0.0010, train/hateful_memes/cross_entropy/avg: 0.1023, train/total_loss: 0.0010, train/total_loss/avg: 0.1023, max mem: 6424.0, experiment: run, epoch: 61, num_updates: 16000, iterations: 16000, max_updates: 22000, lr: 0., ups: 1.85, time: 54s 814ms, time_since_start: 02h 17m 23s 736ms, eta: 54m 48s 887ms
2021-02-14T08:04:21 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:04:25 | mmf.trainers.callbacks.logistics: progress: 16000/22000, val/hateful_memes/cross_entropy: 2.8363, val/total_loss: 2.8363, val/hateful_memes/accuracy: 0.5120, val/hateful_memes/binary_f1: 0.3029, val/hateful_memes/roc_auc: 0.4714, num_updates: 16000, epoch: 61, iterations: 16000, max_updates: 22000, val_time: 04s 603ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:05:16 | mmf.trainers.callbacks.logistics: progress: 16100/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1017, train/total_loss: 0.0008, train/total_loss/avg: 0.1017, max mem: 6424.0, experiment: run, epoch: 61, num_updates: 16100, iterations: 16100, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 807ms, time_since_start: 02h 18m 19s 148ms, eta: 49m 57s 635ms
2021-02-14T08:06:06 | mmf.trainers.callbacks.logistics: progress: 16200/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1010, train/total_loss: 0.0008, train/total_loss/avg: 0.1010, max mem: 6424.0, experiment: run, epoch: 61, num_updates: 16200, iterations: 16200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 196ms, time_since_start: 02h 19m 09s 344ms, eta: 48m 31s 397ms
2021-02-14T08:06:57 | mmf.trainers.callbacks.logistics: progress: 16300/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.1004, train/total_loss: 0.0008, train/total_loss/avg: 0.1004, max mem: 6424.0, experiment: run, epoch: 62, num_updates: 16300, iterations: 16300, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 035ms, time_since_start: 02h 20m 380ms, eta: 48m 29s 002ms
2021-02-14T08:07:47 | mmf.trainers.callbacks.logistics: progress: 16400/22000, train/hateful_memes/cross_entropy: 0.0008, train/hateful_memes/cross_entropy/avg: 0.0998, train/total_loss: 0.0008, train/total_loss/avg: 0.0998, max mem: 6424.0, experiment: run, epoch: 62, num_updates: 16400, iterations: 16400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 192ms, time_since_start: 02h 20m 50s 572ms, eta: 46m 50s 794ms
2021-02-14T08:08:39 | mmf.trainers.callbacks.logistics: progress: 16500/22000, train/hateful_memes/cross_entropy: 0.0006, train/hateful_memes/cross_entropy/avg: 0.0992, train/total_loss: 0.0006, train/total_loss/avg: 0.0992, max mem: 6424.0, experiment: run, epoch: 63, num_updates: 16500, iterations: 16500, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 162ms, time_since_start: 02h 21m 41s 735ms, eta: 46m 53s 939ms
2021-02-14T08:09:29 | mmf.trainers.callbacks.logistics: progress: 16600/22000, train/hateful_memes/cross_entropy: 0.0007, train/hateful_memes/cross_entropy/avg: 0.0986, train/total_loss: 0.0007, train/total_loss/avg: 0.0986, max mem: 6424.0, experiment: run, epoch: 63, num_updates: 16600, iterations: 16600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 199ms, time_since_start: 02h 22m 31s 935ms, eta: 45m 10s 781ms
2021-02-14T08:10:19 | mmf.trainers.callbacks.logistics: progress: 16700/22000, train/hateful_memes/cross_entropy: 0.0007, train/hateful_memes/cross_entropy/avg: 0.0981, train/total_loss: 0.0007, train/total_loss/avg: 0.0981, max mem: 6424.0, experiment: run, epoch: 63, num_updates: 16700, iterations: 16700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 181ms, time_since_start: 02h 23m 22s 116ms, eta: 44m 19s 598ms
2021-02-14T08:11:10 | mmf.trainers.callbacks.logistics: progress: 16800/22000, train/hateful_memes/cross_entropy: 0.0006, train/hateful_memes/cross_entropy/avg: 0.0975, train/total_loss: 0.0006, train/total_loss/avg: 0.0975, max mem: 6424.0, experiment: run, epoch: 64, num_updates: 16800, iterations: 16800, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 261ms, time_since_start: 02h 24m 13s 377ms, eta: 44m 25s 579ms
2021-02-14T08:12:00 | mmf.trainers.callbacks.logistics: progress: 16900/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0969, train/total_loss: 0.0005, train/total_loss/avg: 0.0969, max mem: 6424.0, experiment: run, epoch: 64, num_updates: 16900, iterations: 16900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 180ms, time_since_start: 02h 25m 03s 558ms, eta: 42m 39s 214ms
2021-02-14T08:12:51 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:12:51 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:12:51 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:12:54 | mmf.trainers.callbacks.logistics: progress: 17000/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0963, train/total_loss: 0.0005, train/total_loss/avg: 0.0963, max mem: 6424.0, experiment: run, epoch: 64, num_updates: 17000, iterations: 17000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 964ms, time_since_start: 02h 25m 57s 522ms, eta: 44m 58s 211ms
2021-02-14T08:12:54 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:12:59 | mmf.trainers.callbacks.logistics: progress: 17000/22000, val/hateful_memes/cross_entropy: 2.8262, val/total_loss: 2.8262, val/hateful_memes/accuracy: 0.5020, val/hateful_memes/binary_f1: 0.3025, val/hateful_memes/roc_auc: 0.4945, num_updates: 17000, epoch: 64, iterations: 17000, max_updates: 22000, val_time: 04s 668ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:13:51 | mmf.trainers.callbacks.logistics: progress: 17100/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0958, train/total_loss: 0.0005, train/total_loss/avg: 0.0958, max mem: 6424.0, experiment: run, epoch: 65, num_updates: 17100, iterations: 17100, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 475ms, time_since_start: 02h 26m 53s 666ms, eta: 42m 02s 278ms
2021-02-14T08:14:41 | mmf.trainers.callbacks.logistics: progress: 17200/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0952, train/total_loss: 0.0005, train/total_loss/avg: 0.0952, max mem: 6424.0, experiment: run, epoch: 65, num_updates: 17200, iterations: 17200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 214ms, time_since_start: 02h 27m 43s 881ms, eta: 40m 10s 319ms
2021-02-14T08:15:32 | mmf.trainers.callbacks.logistics: progress: 17300/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0947, train/total_loss: 0.0005, train/total_loss/avg: 0.0947, max mem: 6424.0, experiment: run, epoch: 66, num_updates: 17300, iterations: 17300, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 271ms, time_since_start: 02h 28m 35s 153ms, eta: 40m 09s 784ms
2021-02-14T08:16:22 | mmf.trainers.callbacks.logistics: progress: 17400/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0942, train/total_loss: 0.0005, train/total_loss/avg: 0.0942, max mem: 6424.0, experiment: run, epoch: 66, num_updates: 17400, iterations: 17400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 178ms, time_since_start: 02h 29m 25s 332ms, eta: 38m 28s 203ms
2021-02-14T08:17:12 | mmf.trainers.callbacks.logistics: progress: 17500/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0937, train/total_loss: 0.0005, train/total_loss/avg: 0.0937, max mem: 6424.0, experiment: run, epoch: 66, num_updates: 17500, iterations: 17500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 181ms, time_since_start: 02h 30m 15s 513ms, eta: 37m 38s 169ms
2021-02-14T08:18:04 | mmf.trainers.callbacks.logistics: progress: 17600/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0932, train/total_loss: 0.0005, train/total_loss/avg: 0.0932, max mem: 6424.0, experiment: run, epoch: 67, num_updates: 17600, iterations: 17600, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 275ms, time_since_start: 02h 31m 06s 788ms, eta: 37m 36s 106ms
2021-02-14T08:18:54 | mmf.trainers.callbacks.logistics: progress: 17700/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0926, train/total_loss: 0.0005, train/total_loss/avg: 0.0926, max mem: 6424.0, experiment: run, epoch: 67, num_updates: 17700, iterations: 17700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 177ms, time_since_start: 02h 31m 56s 966ms, eta: 35m 57s 630ms
2021-02-14T08:19:44 | mmf.trainers.callbacks.logistics: progress: 17800/22000, train/hateful_memes/cross_entropy: 0.0005, train/hateful_memes/cross_entropy/avg: 0.0921, train/total_loss: 0.0005, train/total_loss/avg: 0.0921, max mem: 6424.0, experiment: run, epoch: 67, num_updates: 17800, iterations: 17800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 186ms, time_since_start: 02h 32m 47s 152ms, eta: 35m 07s 836ms
2021-02-14T08:20:35 | mmf.trainers.callbacks.logistics: progress: 17900/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0916, train/total_loss: 0.0004, train/total_loss/avg: 0.0916, max mem: 6424.0, experiment: run, epoch: 68, num_updates: 17900, iterations: 17900, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 180ms, time_since_start: 02h 33m 38s 333ms, eta: 34m 58s 396ms
2021-02-14T08:21:25 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:21:25 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:21:25 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:21:29 | mmf.trainers.callbacks.logistics: progress: 18000/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0911, train/total_loss: 0.0003, train/total_loss/avg: 0.0911, max mem: 6424.0, experiment: run, epoch: 68, num_updates: 18000, iterations: 18000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 411ms, time_since_start: 02h 34m 31s 744ms, eta: 35m 36s 460ms
2021-02-14T08:21:29 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:21:34 | mmf.trainers.callbacks.logistics: progress: 18000/22000, val/hateful_memes/cross_entropy: 2.7392, val/total_loss: 2.7392, val/hateful_memes/accuracy: 0.5000, val/hateful_memes/binary_f1: 0.3280, val/hateful_memes/roc_auc: 0.4879, num_updates: 18000, epoch: 68, iterations: 18000, max_updates: 22000, val_time: 04s 983ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:22:25 | mmf.trainers.callbacks.logistics: progress: 18100/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0906, train/total_loss: 0.0003, train/total_loss/avg: 0.0906, max mem: 6424.0, experiment: run, epoch: 69, num_updates: 18100, iterations: 18100, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 507ms, time_since_start: 02h 35m 28s 236ms, eta: 33m 28s 782ms
2021-02-14T08:23:15 | mmf.trainers.callbacks.logistics: progress: 18200/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0901, train/total_loss: 0.0003, train/total_loss/avg: 0.0901, max mem: 6424.0, experiment: run, epoch: 69, num_updates: 18200, iterations: 18200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 215ms, time_since_start: 02h 36m 18s 451ms, eta: 31m 48s 191ms
2021-02-14T08:24:05 | mmf.trainers.callbacks.logistics: progress: 18300/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0896, train/total_loss: 0.0003, train/total_loss/avg: 0.0896, max mem: 6424.0, experiment: run, epoch: 69, num_updates: 18300, iterations: 18300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 133ms, time_since_start: 02h 37m 08s 584ms, eta: 30m 54s 924ms
2021-02-14T08:24:56 | mmf.trainers.callbacks.logistics: progress: 18400/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0891, train/total_loss: 0.0002, train/total_loss/avg: 0.0891, max mem: 6424.0, experiment: run, epoch: 70, num_updates: 18400, iterations: 18400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 901ms, time_since_start: 02h 37m 59s 486ms, eta: 30m 32s 450ms
2021-02-14T08:25:46 | mmf.trainers.callbacks.logistics: progress: 18500/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0887, train/total_loss: 0.0002, train/total_loss/avg: 0.0887, max mem: 6424.0, experiment: run, epoch: 70, num_updates: 18500, iterations: 18500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 095ms, time_since_start: 02h 38m 49s 581ms, eta: 29m 13s 348ms
2021-02-14T08:26:37 | mmf.trainers.callbacks.logistics: progress: 18600/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0882, train/total_loss: 0.0002, train/total_loss/avg: 0.0882, max mem: 6424.0, experiment: run, epoch: 70, num_updates: 18600, iterations: 18600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 107ms, time_since_start: 02h 39m 39s 689ms, eta: 28m 23s 656ms
2021-02-14T08:27:28 | mmf.trainers.callbacks.logistics: progress: 18700/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0877, train/total_loss: 0.0003, train/total_loss/avg: 0.0877, max mem: 6424.0, experiment: run, epoch: 71, num_updates: 18700, iterations: 18700, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 120ms, time_since_start: 02h 40m 30s 809ms, eta: 28m 06s 967ms
2021-02-14T08:28:18 | mmf.trainers.callbacks.logistics: progress: 18800/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0873, train/total_loss: 0.0002, train/total_loss/avg: 0.0873, max mem: 6424.0, experiment: run, epoch: 71, num_updates: 18800, iterations: 18800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 160ms, time_since_start: 02h 41m 20s 970ms, eta: 26m 45s 148ms
2021-02-14T08:29:09 | mmf.trainers.callbacks.logistics: progress: 18900/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0870, train/total_loss: 0.0003, train/total_loss/avg: 0.0870, max mem: 6424.0, experiment: run, epoch: 72, num_updates: 18900, iterations: 18900, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 128ms, time_since_start: 02h 42m 12s 099ms, eta: 26m 24s 996ms
2021-02-14T08:29:59 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:29:59 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:29:59 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:30:03 | mmf.trainers.callbacks.logistics: progress: 19000/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0865, train/total_loss: 0.0003, train/total_loss/avg: 0.0865, max mem: 6424.0, experiment: run, epoch: 72, num_updates: 19000, iterations: 19000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 553ms, time_since_start: 02h 43m 05s 653ms, eta: 26m 46s 610ms
2021-02-14T08:30:03 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:30:07 | mmf.trainers.callbacks.logistics: progress: 19000/22000, val/hateful_memes/cross_entropy: 2.9165, val/total_loss: 2.9165, val/hateful_memes/accuracy: 0.5180, val/hateful_memes/binary_f1: 0.2933, val/hateful_memes/roc_auc: 0.4960, num_updates: 19000, epoch: 72, iterations: 19000, max_updates: 22000, val_time: 04s 491ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:30:58 | mmf.trainers.callbacks.logistics: progress: 19100/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0861, train/total_loss: 0.0003, train/total_loss/avg: 0.0861, max mem: 6424.0, experiment: run, epoch: 72, num_updates: 19100, iterations: 19100, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 508ms, time_since_start: 02h 44m 653ms, eta: 24m 24s 739ms
2021-02-14T08:31:49 | mmf.trainers.callbacks.logistics: progress: 19200/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0857, train/total_loss: 0.0004, train/total_loss/avg: 0.0857, max mem: 6424.0, experiment: run, epoch: 73, num_updates: 19200, iterations: 19200, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 201ms, time_since_start: 02h 44m 51s 854ms, eta: 23m 53s 634ms
2021-02-14T08:32:39 | mmf.trainers.callbacks.logistics: progress: 19300/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0853, train/total_loss: 0.0003, train/total_loss/avg: 0.0853, max mem: 6424.0, experiment: run, epoch: 73, num_updates: 19300, iterations: 19300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 196ms, time_since_start: 02h 45m 42s 051ms, eta: 22m 35s 314ms
2021-02-14T08:33:29 | mmf.trainers.callbacks.logistics: progress: 19400/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0848, train/total_loss: 0.0003, train/total_loss/avg: 0.0848, max mem: 6424.0, experiment: run, epoch: 73, num_updates: 19400, iterations: 19400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 188ms, time_since_start: 02h 46m 32s 240ms, eta: 21m 44s 900ms
2021-02-14T08:34:20 | mmf.trainers.callbacks.logistics: progress: 19500/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0845, train/total_loss: 0.0004, train/total_loss/avg: 0.0845, max mem: 6424.0, experiment: run, epoch: 74, num_updates: 19500, iterations: 19500, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 015ms, time_since_start: 02h 47m 23s 255ms, eta: 21m 15s 377ms
2021-02-14T08:35:10 | mmf.trainers.callbacks.logistics: progress: 19600/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0841, train/total_loss: 0.0004, train/total_loss/avg: 0.0841, max mem: 6424.0, experiment: run, epoch: 74, num_updates: 19600, iterations: 19600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 212ms, time_since_start: 02h 48m 13s 467ms, eta: 20m 05s 094ms
2021-02-14T08:36:01 | mmf.trainers.callbacks.logistics: progress: 19700/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0836, train/total_loss: 0.0004, train/total_loss/avg: 0.0836, max mem: 6424.0, experiment: run, epoch: 75, num_updates: 19700, iterations: 19700, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 026ms, time_since_start: 02h 49m 04s 494ms, eta: 19m 33s 618ms
2021-02-14T08:36:52 | mmf.trainers.callbacks.logistics: progress: 19800/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0832, train/total_loss: 0.0004, train/total_loss/avg: 0.0832, max mem: 6424.0, experiment: run, epoch: 75, num_updates: 19800, iterations: 19800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 173ms, time_since_start: 02h 49m 54s 668ms, eta: 18m 23s 823ms
2021-02-14T08:37:42 | mmf.trainers.callbacks.logistics: progress: 19900/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0828, train/total_loss: 0.0004, train/total_loss/avg: 0.0828, max mem: 6424.0, experiment: run, epoch: 75, num_updates: 19900, iterations: 19900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 162ms, time_since_start: 02h 50m 44s 830ms, eta: 17m 33s 416ms
2021-02-14T08:38:33 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:38:33 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:38:33 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:38:36 | mmf.trainers.callbacks.logistics: progress: 20000/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0824, train/total_loss: 0.0004, train/total_loss/avg: 0.0824, max mem: 6424.0, experiment: run, epoch: 76, num_updates: 20000, iterations: 20000, max_updates: 22000, lr: 0., ups: 1.85, time: 54s 251ms, time_since_start: 02h 51m 39s 082ms, eta: 18m 05s 039ms
2021-02-14T08:38:36 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:38:41 | mmf.trainers.callbacks.logistics: progress: 20000/22000, val/hateful_memes/cross_entropy: 2.8122, val/total_loss: 2.8122, val/hateful_memes/accuracy: 0.5080, val/hateful_memes/binary_f1: 0.2971, val/hateful_memes/roc_auc: 0.4936, num_updates: 20000, epoch: 76, iterations: 20000, max_updates: 22000, val_time: 04s 695ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:39:31 | mmf.trainers.callbacks.logistics: progress: 20100/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0820, train/total_loss: 0.0003, train/total_loss/avg: 0.0820, max mem: 6424.0, experiment: run, epoch: 76, num_updates: 20100, iterations: 20100, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 442ms, time_since_start: 02h 52m 34s 221ms, eta: 15m 58s 401ms
2021-02-14T08:40:21 | mmf.trainers.callbacks.logistics: progress: 20200/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0816, train/total_loss: 0.0003, train/total_loss/avg: 0.0816, max mem: 6424.0, experiment: run, epoch: 76, num_updates: 20200, iterations: 20200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 166ms, time_since_start: 02h 53m 24s 388ms, eta: 15m 03s 003ms
2021-02-14T08:41:13 | mmf.trainers.callbacks.logistics: progress: 20300/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0812, train/total_loss: 0.0003, train/total_loss/avg: 0.0812, max mem: 6424.0, experiment: run, epoch: 77, num_updates: 20300, iterations: 20300, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 319ms, time_since_start: 02h 54m 15s 707ms, eta: 14m 32s 433ms
2021-02-14T08:42:03 | mmf.trainers.callbacks.logistics: progress: 20400/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0808, train/total_loss: 0.0003, train/total_loss/avg: 0.0808, max mem: 6424.0, experiment: run, epoch: 77, num_updates: 20400, iterations: 20400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 191ms, time_since_start: 02h 55m 05s 899ms, eta: 13m 23s 065ms
2021-02-14T08:42:54 | mmf.trainers.callbacks.logistics: progress: 20500/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0804, train/total_loss: 0.0004, train/total_loss/avg: 0.0804, max mem: 6424.0, experiment: run, epoch: 78, num_updates: 20500, iterations: 20500, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 257ms, time_since_start: 02h 55m 57s 156ms, eta: 12m 48s 857ms
2021-02-14T08:43:44 | mmf.trainers.callbacks.logistics: progress: 20600/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0801, train/total_loss: 0.0004, train/total_loss/avg: 0.0801, max mem: 6424.0, experiment: run, epoch: 78, num_updates: 20600, iterations: 20600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 172ms, time_since_start: 02h 56m 47s 329ms, eta: 11m 42s 415ms
2021-02-14T08:44:34 | mmf.trainers.callbacks.logistics: progress: 20700/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0797, train/total_loss: 0.0004, train/total_loss/avg: 0.0797, max mem: 6424.0, experiment: run, epoch: 78, num_updates: 20700, iterations: 20700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 189ms, time_since_start: 02h 57m 37s 519ms, eta: 10m 52s 469ms
2021-02-14T08:45:25 | mmf.trainers.callbacks.logistics: progress: 20800/22000, train/hateful_memes/cross_entropy: 0.0004, train/hateful_memes/cross_entropy/avg: 0.0793, train/total_loss: 0.0004, train/total_loss/avg: 0.0793, max mem: 6424.0, experiment: run, epoch: 79, num_updates: 20800, iterations: 20800, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 043ms, time_since_start: 02h 58m 28s 562ms, eta: 10m 12s 522ms
2021-02-14T08:46:16 | mmf.trainers.callbacks.logistics: progress: 20900/22000, train/hateful_memes/cross_entropy: 0.0003, train/hateful_memes/cross_entropy/avg: 0.0789, train/total_loss: 0.0003, train/total_loss/avg: 0.0789, max mem: 6424.0, experiment: run, epoch: 79, num_updates: 20900, iterations: 20900, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 110ms, time_since_start: 02h 59m 18s 673ms, eta: 09m 11s 214ms
2021-02-14T08:47:06 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:47:06 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:47:06 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:47:09 | mmf.trainers.callbacks.logistics: progress: 21000/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0785, train/total_loss: 0.0002, train/total_loss/avg: 0.0785, max mem: 6424.0, experiment: run, epoch: 79, num_updates: 21000, iterations: 21000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 489ms, time_since_start: 03h 12s 163ms, eta: 08m 54s 899ms
2021-02-14T08:47:09 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:47:14 | mmf.trainers.callbacks.logistics: progress: 21000/22000, val/hateful_memes/cross_entropy: 3.1806, val/total_loss: 3.1806, val/hateful_memes/accuracy: 0.5020, val/hateful_memes/binary_f1: 0.2145, val/hateful_memes/roc_auc: 0.4946, num_updates: 21000, epoch: 79, iterations: 21000, max_updates: 22000, val_time: 04s 841ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:48:05 | mmf.trainers.callbacks.logistics: progress: 21100/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0782, train/total_loss: 0.0002, train/total_loss/avg: 0.0782, max mem: 6424.0, experiment: run, epoch: 80, num_updates: 21100, iterations: 21100, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 278ms, time_since_start: 03h 01m 08s 283ms, eta: 07m 41s 506ms
2021-02-14T08:48:55 | mmf.trainers.callbacks.logistics: progress: 21200/22000, train/hateful_memes/cross_entropy: 0.0002, train/hateful_memes/cross_entropy/avg: 0.0778, train/total_loss: 0.0002, train/total_loss/avg: 0.0778, max mem: 6424.0, experiment: run, epoch: 80, num_updates: 21200, iterations: 21200, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 134ms, time_since_start: 03h 01m 58s 418ms, eta: 06m 41s 078ms
2021-02-14T08:49:46 | mmf.trainers.callbacks.logistics: progress: 21300/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0774, train/total_loss: 0.0001, train/total_loss/avg: 0.0774, max mem: 6424.0, experiment: run, epoch: 81, num_updates: 21300, iterations: 21300, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 870ms, time_since_start: 03h 02m 49s 289ms, eta: 05m 56s 093ms
2021-02-14T08:50:36 | mmf.trainers.callbacks.logistics: progress: 21400/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0771, train/total_loss: 0.0001, train/total_loss/avg: 0.0771, max mem: 6424.0, experiment: run, epoch: 81, num_updates: 21400, iterations: 21400, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 136ms, time_since_start: 03h 03m 39s 425ms, eta: 05m 816ms
2021-02-14T08:51:26 | mmf.trainers.callbacks.logistics: progress: 21500/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0767, train/total_loss: 0.0001, train/total_loss/avg: 0.0767, max mem: 6424.0, experiment: run, epoch: 81, num_updates: 21500, iterations: 21500, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 149ms, time_since_start: 03h 04m 29s 575ms, eta: 04m 10s 749ms
2021-02-14T08:52:17 | mmf.trainers.callbacks.logistics: progress: 21600/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0764, train/total_loss: 0.0001, train/total_loss/avg: 0.0764, max mem: 6424.0, experiment: run, epoch: 82, num_updates: 21600, iterations: 21600, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 864ms, time_since_start: 03h 05m 20s 439ms, eta: 03m 23s 456ms
2021-02-14T08:53:07 | mmf.trainers.callbacks.logistics: progress: 21700/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0760, train/total_loss: 0.0001, train/total_loss/avg: 0.0760, max mem: 6424.0, experiment: run, epoch: 82, num_updates: 21700, iterations: 21700, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 153ms, time_since_start: 03h 06m 10s 592ms, eta: 02m 30s 459ms
2021-02-14T08:53:58 | mmf.trainers.callbacks.logistics: progress: 21800/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0757, train/total_loss: 0.0001, train/total_loss/avg: 0.0757, max mem: 6424.0, experiment: run, epoch: 82, num_updates: 21800, iterations: 21800, max_updates: 22000, lr: 0., ups: 2.00, time: 50s 182ms, time_since_start: 03h 07m 774ms, eta: 01m 40s 364ms
2021-02-14T08:54:49 | mmf.trainers.callbacks.logistics: progress: 21900/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0753, train/total_loss: 0.0001, train/total_loss/avg: 0.0753, max mem: 6424.0, experiment: run, epoch: 83, num_updates: 21900, iterations: 21900, max_updates: 22000, lr: 0., ups: 1.96, time: 51s 128ms, time_since_start: 03h 07m 51s 902ms, eta: 51s 128ms
2021-02-14T08:55:39 | mmf.trainers.callbacks.checkpoint: Checkpoint time. Saving a checkpoint.
WARNING 2021-02-14T08:55:39 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:55:39 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:200: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:55:42 | mmf.trainers.callbacks.logistics: progress: 22000/22000, train/hateful_memes/cross_entropy: 0.0001, train/hateful_memes/cross_entropy/avg: 0.0750, train/total_loss: 0.0001, train/total_loss/avg: 0.0750, max mem: 6424.0, experiment: run, epoch: 83, num_updates: 22000, iterations: 22000, max_updates: 22000, lr: 0., ups: 1.89, time: 53s 691ms, time_since_start: 03h 08m 45s 593ms, eta: 0ms
2021-02-14T08:55:42 | mmf.trainers.core.training_loop: Evaluation time. Running on full validation set...
2021-02-14T08:55:47 | mmf.trainers.callbacks.logistics: progress: 22000/22000, val/hateful_memes/cross_entropy: 2.9306, val/total_loss: 2.9306, val/hateful_memes/accuracy: 0.5140, val/hateful_memes/binary_f1: 0.2915, val/hateful_memes/roc_auc: 0.4929, num_updates: 22000, epoch: 83, iterations: 22000, max_updates: 22000, val_time: 04s 653ms, best_update: 1000, best_iteration: 1000, best_val/hateful_memes/roc_auc: 0.526716
2021-02-14T08:55:48 | mmf.trainers.core.training_loop: Stepping into final validation check
2021-02-14T08:55:48 | mmf.utils.checkpoint: Restoring checkpoint
2021-02-14T08:55:48 | mmf.utils.checkpoint: Loading checkpoint
WARNING 2021-02-14T08:55:54 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:218: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

WARNING 2021-02-14T08:55:54 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:218: UserWarning: Please also save or load the state of the optimzer when saving or loading the scheduler.
  warnings.warn(SAVE_STATE_WARNING, UserWarning)

2021-02-14T08:55:54 | mmf.utils.checkpoint: Checkpoint loaded.
2021-02-14T08:55:54 | mmf.utils.checkpoint: Current num updates: 1000
2021-02-14T08:55:54 | mmf.utils.checkpoint: Current iteration: 1000
2021-02-14T08:55:54 | mmf.utils.checkpoint: Current epoch: 4
2021-02-14T08:55:54 | mmf.trainers.mmf_trainer: Starting inference on test set
WARNING 2021-02-14T08:55:56 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/modules/losses.py:98: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true
  warnings.warn(

WARNING 2021-02-14T08:55:56 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/modules/losses.py:98: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with evaluation.predict=true
  warnings.warn(

WARNING 2021-02-14T08:55:56 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/common/report.py:80: UserWarning: targets not found in report. Metrics calculation might not work as expected.
  warnings.warn(

WARNING 2021-02-14T08:55:56 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/common/report.py:80: UserWarning: targets not found in report. Metrics calculation might not work as expected.
  warnings.warn(

After training, well, the model seems to have been saved anyway so I carry on and use it to evaluate the validation set

2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option config to mmf/projects/hateful_memes/configs/unimodal/image.yaml
2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option model to unimodal_image
2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option datasets to hateful_memes
2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option run_type to val
2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option checkpoint.resume_file to ./save/unimodal_image_final.pth
2021-02-14T09:05:27 | mmf.utils.configuration: Overriding option checkpoint.resume_pretrained to False
2021-02-14T09:05:27 | mmf: Logging to: ./save/train.log
2021-02-14T09:05:27 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=mmf/projects/hateful_memes/configs/unimodal/image.yaml', 'model=unimodal_image', 'dataset=hateful_memes', 'run_type=val', 'checkpoint.resume_file=./save/unimodal_image_final.pth', 'checkpoint.resume_pretrained=False'])
2021-02-14T09:05:27 | mmf_cli.run: Torch version: 1.6.0
2021-02-14T09:05:27 | mmf.utils.general: CUDA Device 0 is: Tesla P100-PCIE-16GB
2021-02-14T09:05:27 | mmf_cli.run: Using seed 27714785
2021-02-14T09:05:27 | mmf.trainers.mmf_trainer: Loading datasets
2021-02-14T09:05:28 | torchtext.vocab: Loading vectors from /home/sgg29/.cache/torch/mmf/glove.6B.300d.txt.pt
2021-02-14T09:05:29 | torchtext.vocab: Loading vectors from /home/sgg29/.cache/torch/mmf/glove.6B.300d.txt.pt
2021-02-14T09:05:32 | mmf.trainers.mmf_trainer: Loading model
2021-02-14T09:05:39 | mmf.trainers.mmf_trainer: Loading optimizer
2021-02-14T09:05:39 | mmf.trainers.mmf_trainer: Loading metrics
2021-02-14T09:05:39 | mmf.utils.checkpoint: Loading checkpoint
WARNING 2021-02-14T09:05:40 | mmf: Key data_parallel is not present in registry, returning default value of None
WARNING 2021-02-14T09:05:40 | mmf: Key distributed is not present in registry, returning default value of None
WARNING 2021-02-14T09:05:40 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/utils/checkpoint.py:291: UserWarning: 'optimizer' key is not present in the checkpoint asked to be loaded. Skipping.
  warnings.warn(

WARNING 2021-02-14T09:05:40 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/utils/checkpoint.py:291: UserWarning: 'optimizer' key is not present in the checkpoint asked to be loaded. Skipping.
  warnings.warn(

WARNING 2021-02-14T09:05:40 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/utils/checkpoint.py:334: UserWarning: 'lr_scheduler' key is not present in the checkpoint asked to be loaded. Setting lr_scheduler's last_epoch to current_iteration.
  warnings.warn(

WARNING 2021-02-14T09:05:40 | py.warnings: /rds/user/sgg29/hpc-work/shiv/hateful-memes/env/lib/python3.8/site-packages/mmf/utils/checkpoint.py:334: UserWarning: 'lr_scheduler' key is not present in the checkpoint asked to be loaded. Setting lr_scheduler's last_epoch to current_iteration.
  warnings.warn(

2021-02-14T09:05:40 | mmf.utils.checkpoint: Checkpoint loaded.
2021-02-14T09:05:40 | mmf.utils.checkpoint: Current num updates: 0
2021-02-14T09:05:40 | mmf.utils.checkpoint: Current iteration: 0
2021-02-14T09:05:40 | mmf.utils.checkpoint: Current epoch: 0
2021-02-14T09:05:40 | mmf.trainers.mmf_trainer: ===== Model =====
2021-02-14T09:05:40 | mmf.trainers.mmf_trainer: UnimodalModal(
  (base): UnimodalBase(
    (encoder): ResNet152ImageEncoder(
      (model): Sequential(
        (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
        (4): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
        )
        (5): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (3): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (4): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (5): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (6): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (7): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
        )
        (6): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (3): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (4): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (5): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (6): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (7): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (8): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (9): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (10): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (11): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (12): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (13): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (14): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (15): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (16): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (17): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (18): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (19): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (20): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (21): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (22): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (23): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (24): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (25): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (26): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (27): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (28): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (29): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (30): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (31): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (32): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (33): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (34): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (35): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
        )
        (7): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu): ReLU(inplace=True)
          )
        )
      )
      (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    )
  )
  (classifier): MLPClassifer(
    (layers): ModuleList(
      (0): Linear(in_features=2048, out_features=768, bias=True)
      (1): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Dropout(p=0.5, inplace=False)
      (4): Linear(in_features=768, out_features=768, bias=True)
      (5): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): Dropout(p=0.5, inplace=False)
      (8): Linear(in_features=768, out_features=2, bias=True)
    )
  )
  (losses): Losses(
    (losses): ModuleList(
      (0): MMFLoss(
        (loss_criterion): CrossEntropyLoss(
          (loss_fn): CrossEntropyLoss()
        )
      )
    )
  )
)
2021-02-14T09:05:40 | mmf.utils.general: Total Parameters: 60312642. Trained Parameters: 60312642
2021-02-14T09:05:40 | mmf.trainers.mmf_trainer: Starting inference on val set
2021-02-14T09:05:45 | mmf.trainers.callbacks.logistics: val/hateful_memes/cross_entropy: 0.7119, val/total_loss: 0.7119, val/hateful_memes/accuracy: 0.5200, val/hateful_memes/binary_f1: 0.3182, val/hateful_memes/roc_auc: 0.5267
2021-02-14T09:05:45 | mmf.trainers.callbacks.logistics: Finished run in 05s 801ms

Expected behavior:

I expect to get accuracy 52.73, and auroc 58.79

Instead I get accuracy 52.00, and auroc 52.67

The auroc seems to be very different.

I have checked, and I am evaluating on the dev_seen set by changing the dataset in the yaml config file, at /home/username/.local/lib/python3.8/site-packages/mmf/configs/datasets/hateful_memes/defaults.yaml

I also get closer and satisfactory reproduction of results when evaluating using the pre-trained model. So there must be something wrong with the training process (see the errors I encountered).

Environment:

Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 11.1
GCC version: Could not collect
CMake version: version 3.19.2

Python version: 3.8
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.6.0
[pip3] torchtext==0.5.0
[pip3] torchvision==0.7.0
[conda] Could not collect
shivgodhia commented 3 years ago

Here are some results I've gotten from training multiple times

2021-02-14T05:41:38 | mmf.trainers.callbacks.logistics: val/hateful_memes/cross_entropy: 2.7552, val/total_loss: 2.7552, val/hateful_memes/accuracy: 0.5160, val/hateful_memes/binary_f1: 0.3315, val/hateful_memes/roc_auc: 0.5081

2021-02-14T05:43:42 | mmf.trainers.callbacks.logistics: val/hateful_memes/cross_entropy: 0.7072, val/total_loss: 0.7072, val/hateful_memes/accuracy: 0.5100, val/hateful_memes/binary_f1: 0.4394, val/hateful_memes/roc_auc: 0.5036

2021-02-14T13:56:23 | mmf.trainers.callbacks.logistics: val/hateful_memes/cross_entropy: 0.8421, val/total_loss: 0.8421, val/hateful_memes/accuracy: 0.4960, val/hateful_memes/binary_f1: 0.3668, val/hateful_memes/roc_auc: 0.5317

As you'll see I don't get anywhere really close to the reported scores of acc 52.73 and auroc 58.79

apsdehal commented 3 years ago

Hi @hivestrung ,

can you clarify which phase you are running the model on? Looking at your issue, my guess is that you are using MMF default which is phase 2 and the final baseline numbers for phase 2 are reported in https://proceedings.neurips.cc//paper/2020/file/1b84c4cee2b8b3d823b30e2d604b1878-Paper.pdf and arxiv need to be yet updated. The numbers in the NeurIPS version are matching with what you are observing as of now.

shivgodhia commented 3 years ago

Hi @apsdehal

How do I check which phase I’m running it on? If you mean the dataset, I changed the dev set to seen, rather than unseen, in the defaults.yaml somewhere within Python site packages/mmf. And I can verify that changing that affects the score.

Also, using the pretrained model From the zoo I do get around 57 for auroc. So it still feels like my dataset is correct, but the pre-trained model is more effective than the standard training that comes with mmf.

Thanks

shivgodhia commented 3 years ago

@apsdehal Apologies, I wasn't sure if you've managed to take a look at my latest comment regarding the phases? Thanks so much by the way! Here, i've compiled the results from my trained model vs using the pretrained model from the model zoo in MMF

test code acc auroc
my trained model mmf_run config=mmf/projects/hateful_memes/configs/unimodal/image.yaml model=unimodal_image dataset=hateful_memes run_type=val checkpoint.resume_file=./save_image-grid/unimodal_image_final.pth checkpoint.resume_pretrained=False dataset_config.hateful_memes.annotations.val[0]=hateful_memes/defaults/annotations/dev_seen.jsonl dataset_config.hateful_memes.annotations.test[0]=hateful_memes/defaults/annotations/test_seen.jsonl 49.60 53.17
using the pre-trained model from the zoo mmf_run config=mmf/projects/hateful_memes/configs/unimodal/image.yaml model=unimodal_image dataset=hateful_memes run_type=val checkpoint.resume_zoo=unimodal_image.hateful_memes.images checkpoint.resume_pretrained=False dataset_config.hateful_memes.annotations.val[0]=hateful_memes/defaults/annotations/dev_seen.jsonl dataset_config.hateful_memes.annotations.test[0]=hateful_memes/defaults/annotations/test_seen.jsonl 51.40 57.21
arxiv paper - 52.73 58.79
final baseline for phase 2 in neurips paper - 50.67 52.33

Both my self-trained model and the model from the zoo are giving different results on the same dev_seen dataset. It seems that the zoo's model reports numbers similar to the arxiv paper and my self-trained model reports numbers similar to your latest neurips paper. I've just re-run the validations to be absolutely sure.

Any idea why this might be the case?

apsdehal commented 3 years ago

Hi @hivestrung,

I can try running the exact command on my side to see if I can replicate the result. I would actually expect this within the range. Have you tried running the command multiple times to average?

apsdehal commented 3 years ago

@hivestrung The difference is caused because the new train set (released in phase 2) is also different was reannotated to fix bad examples. You won't be able to replicate the exact results present in the arXiv. I would suggest that you use phase 2 and use the baselines in the NeurIPS paper. We will try to open up Phase 2 submissions soon and update the arXiv.