MILVLG / openvqa

A lightweight, scalable, and general framework for visual question answering research
Apache License 2.0
318 stars 64 forks source link

The GQA results are lower than the reported performance. #59

Closed luogen1996 closed 4 years ago

luogen1996 commented 4 years ago

We follow the default setting provided by the repo, but get a lower performance on both online leadboard and offline leadboard. The log file is as following. By the way, I have tested the official mcan_small model and achieved 58.3 on online leadboard. That's strange. Any one can help to fix this?


{ BATCH_SIZE        }->64
{ BBOXFEAT_EMB_SIZE }->2048
{ CACHE_PATH        }->./results/cache
{ CKPTS_PATH        }->./ckpts
{ CKPT_EPOCH        }->0
{ CKPT_PATH         }->None
{ CKPT_VERSION      }->2134787
{ DATASET           }->gqa
{ DATA_PATH         }->{'vqa': './data/vqa', 'gqa': './data/gqa', 'clevr': './data/clevr'}
{ DATA_ROOT         }->./data
{ DEVICES           }->[0]
{ DROPOUT_R         }->0.1
{ EVAL_BATCH_SIZE   }->32
{ EVAL_EVERY_EPOCH  }->True
{ FEATS_PATH        }->{'vqa': {'train': './data/vqa/feats/train2014', 'val': './data/vqa/feats/val2014', 'test': './data/vqa/feats/test2015'}, 'gqa': {'default-frcn': './data/gqa/feats/gqa-frcn', 'default-grid': './data/gqa/feats/gqa-grid'}, 'clevr': {'train': './data/clevr/feats/train', 'val': './data/clevr/feats/val', 'test': './data/clevr/feats/test'}}
{ FEAT_SIZE         }->{'vqa': {'FRCN_FEAT_SIZE': (100, 2048), 'BBOX_FEAT_SIZE': (100, 5)}, 'gqa': {'FRCN_FEAT_SIZE': (100, 2048), 'GRID_FEAT_SIZE': (49, 2048), 'BBOX_FEAT_SIZE': (100, 5)}, 'clevr': {'GRID_FEAT_SIZE': (196, 1024)}}
{ FF_SIZE           }->2048
{ FLAT_GLIMPSES     }->1
{ FLAT_MLP_SIZE     }->512
{ FLAT_OUT_SIZE     }->1024
{ GPU               }->2
{ GRAD_ACCU_STEPS   }->1
{ GRAD_NORM_CLIP    }->-1
{ HIDDEN_SIZE       }->512
{ LAYER             }->6
{ LOG_PATH          }->./results/log
{ LOSS_FUNC         }->ce
{ LOSS_FUNC_NAME_DICT }->{'ce': 'CrossEntropyLoss', 'bce': 'BCEWithLogitsLoss', 'kld': 'KLDivLoss', 'mse': 'MSELoss'}
{ LOSS_FUNC_NONLINEAR }->{'ce': [None, 'flat'], 'bce': [None, None], 'kld': ['log_softmax', None], 'mse': [None, None]}
{ LOSS_REDUCTION    }->sum
{ LR_BASE           }->0.0001
{ LR_DECAY_LIST     }->[8, 10]
{ LR_DECAY_R        }->0.2
{ MAX_EPOCH         }->11
{ MODEL             }->mcan_small
{ MODEL_USE         }->mcan
{ MULTI_HEAD        }->8
{ NUM_WORKERS       }->8
{ N_GPU             }->1
{ OPT               }->Adam
{ OPT_PARAMS        }->{'betas': (0.9, 0.98), 'eps': 1e-09, 'weight_decay': 0, 'amsgrad': False}
{ PIN_MEM           }->True
{ PRED_PATH         }->./results/pred
{ RAW_PATH          }->{'vqa': {'train': './data/vqa/raw/v2_OpenEnded_mscoco_train2014_questions.json', 'train-anno': './data/vqa/raw/v2_mscoco_train2014_annotations.json', 'val': './data/vqa/raw/v2_OpenEnded_mscoco_val2014_questions.json', 'val-anno': './data/vqa/raw/v2_mscoco_val2014_annotations.json', 'vg': './data/vqa/raw/VG_questions.json', 'vg-anno': './data/vqa/raw/VG_annotations.json', 'test': './data/vqa/raw/v2_OpenEnded_mscoco_test2015_questions.json'}, 'gqa': {'train': './data/gqa/raw/questions1.2/train_balanced_questions.json', 'val': './data/gqa/raw/questions1.2/val_balanced_questions.json', 'testdev': './data/gqa/raw/questions1.2/testdev_balanced_questions.json', 'test': './data/gqa/raw/questions1.2/submission_all_questions.json', 'val_all': './data/gqa/raw/questions1.2/val_all_questions.json', 'testdev_all': './data/gqa/raw/questions1.2/testdev_all_questions.json', 'train_choices': './data/gqa/raw/eval/train_choices', 'val_choices': './data/gqa/raw/eval/val_choices.json'}, 'clevr': {'train': './data/clevr/raw/questions/CLEVR_train_questions.json', 'val': './data/clevr/raw/questions/CLEVR_val_questions.json', 'test': './data/clevr/raw/questions/CLEVR_test_questions.json'}}
{ RESULT_PATH       }->./results/result_test
{ RESUME            }->False
{ RUN_MODE          }->train
{ SEED              }->2134787
{ SPLIT             }->{'train': 'train+val', 'val': 'testdev', 'test': 'test'}
{ SPLITS            }->{'vqa': {'train': '', 'val': 'val', 'test': 'test'}, 'gqa': {'train': 'train+val', 'val': 'testdev', 'test': 'test'}, 'clevr': {'train': '', 'val': 'val', 'test': 'test'}}
{ SUB_BATCH_SIZE    }->64
{ TASK_LOSS_CHECK   }->{'vqa': ['bce', 'kld'], 'gqa': ['ce'], 'clevr': ['ce']}
{ TEST_SAVE_PRED    }->False
{ TRAIN_SPLIT       }->train+val
{ USE_AUX_FEAT      }->True
{ USE_BBOX_FEAT     }->True
{ USE_GLOVE         }->True
{ VERBOSE           }->True
{ VERSION           }->2134787
{ WARMUP_EPOCH      }->2
{ WORD_EMBED_SIZE   }->300
=====================================
nowTime: 2020-01-19 14:05:12
Epoch: 1, Loss: 1.7046293609006327, Lr: 6.666666666666667e-05
Elapsed time: 5132, Speed(s/batch): 0.3055514312835215

Binary: 58.18%
Open: 36.41%
Accuracy: 46.41%
Distribution: 3.07 (lower is better)
Accuracy / structural type:
  choose: 60.05% (1129 questions)
  compare: 54.33% (589 questions)
  logical: 56.63% (1803 questions)
  query: 36.41% (6805 questions)
  verify: 59.50% (2252 questions)
Accuracy / semantic type:
  attr: 50.46% (5186 questions)
  cat: 42.04% (1149 questions)
  global: 45.22% (157 questions)
  obj: 64.14% (778 questions)
  rel: 40.83% (5308 questions)
Accuracy / steps number:
  1: 60.76% (237 questions)
  2: 43.16% (6395 questions)
  3: 47.63% (4266 questions)
  4: 45.02% (793 questions)
  5: 59.37% (822 questions)
  6: 78.05% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 45.56% (630 questions)
  5: 38.53% (1290 questions)
  6: 43.30% (2074 questions)
  7: 43.97% (1642 questions)
  8: 47.85% (1185 questions)
  9: 50.12% (1281 questions)
  10: 51.88% (1249 questions)
  11: 45.47% (994 questions)
  12: 51.10% (638 questions)
  13: 50.43% (462 questions)
  14: 50.72% (345 questions)
  15: 57.81% (237 questions)
  16: 49.57% (117 questions)
  17: 44.68% (94 questions)
  18: 52.63% (76 questions)
  19: 60.47% (43 questions)
  20: 53.12% (32 questions)
  21: 57.89% (19 questions)
  22: 50.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 15:31:07
Epoch: 2, Loss: 1.3994251725464903, Lr: 0.0001
Elapsed time: 5100, Speed(s/batch): 0.3036757551451245

Binary: 63.94%
Open: 37.90%
Accuracy: 49.85%
Distribution: 2.36 (lower is better)
Accuracy / structural type:
  choose: 65.46% (1129 questions)
  compare: 55.86% (589 questions)
  logical: 61.40% (1803 questions)
  query: 37.90% (6805 questions)
  verify: 67.32% (2252 questions)
Accuracy / semantic type:
  attr: 52.97% (5186 questions)
  cat: 40.91% (1149 questions)
  global: 46.50% (157 questions)
  obj: 79.05% (778 questions)
  rel: 44.56% (5308 questions)
Accuracy / steps number:
  1: 64.56% (237 questions)
  2: 45.54% (6395 questions)
  3: 52.93% (4266 questions)
  4: 48.17% (793 questions)
  5: 62.53% (822 questions)
  6: 65.85% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.13% (151 questions)
  4: 44.44% (630 questions)
  5: 38.76% (1290 questions)
  6: 45.90% (2074 questions)
  7: 47.69% (1642 questions)
  8: 53.84% (1185 questions)
  9: 55.35% (1281 questions)
  10: 55.40% (1249 questions)
  11: 51.21% (994 questions)
  12: 54.86% (638 questions)
  13: 53.68% (462 questions)
  14: 59.13% (345 questions)
  15: 56.96% (237 questions)
  16: 57.26% (117 questions)
  17: 51.06% (94 questions)
  18: 53.95% (76 questions)
  19: 69.77% (43 questions)
  20: 56.25% (32 questions)
  21: 57.89% (19 questions)
  22: 50.00% (12 questions)
  23: 0.00% (4 questions)
  24: 50.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 16:56:30
Epoch: 3, Loss: 1.2595117600660048, Lr: 0.0001
Elapsed time: 5073, Speed(s/batch): 0.3020486486869416

Binary: 68.66%
Open: 37.74%
Accuracy: 51.93%
Distribution: 2.90 (lower is better)
Accuracy / structural type:
  choose: 68.47% (1129 questions)
  compare: 58.23% (589 questions)
  logical: 66.89% (1803 questions)
  query: 37.74% (6805 questions)
  verify: 72.91% (2252 questions)
Accuracy / semantic type:
  attr: 56.85% (5186 questions)
  cat: 40.91% (1149 questions)
  global: 53.50% (157 questions)
  obj: 82.52% (778 questions)
  rel: 44.99% (5308 questions)
Accuracy / steps number:
  1: 67.09% (237 questions)
  2: 46.25% (6395 questions)
  3: 55.53% (4266 questions)
  4: 55.11% (793 questions)
  5: 67.76% (822 questions)
  6: 70.73% (41 questions)
  7: 95.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 34.44% (151 questions)
  4: 46.19% (630 questions)
  5: 38.99% (1290 questions)
  6: 47.73% (2074 questions)
  7: 51.34% (1642 questions)
  8: 53.76% (1185 questions)
  9: 58.24% (1281 questions)
  10: 58.29% (1249 questions)
  11: 52.21% (994 questions)
  12: 58.78% (638 questions)
  13: 54.76% (462 questions)
  14: 60.00% (345 questions)
  15: 63.71% (237 questions)
  16: 63.25% (117 questions)
  17: 52.13% (94 questions)
  18: 53.95% (76 questions)
  19: 79.07% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 66.67% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 18:21:26
Epoch: 4, Loss: 1.1537336046805873, Lr: 0.0001
Elapsed time: 5056, Speed(s/batch): 0.3010697898679644

Binary: 70.19%
Open: 38.28%
Accuracy: 52.93%
Distribution: 2.06 (lower is better)
Accuracy / structural type:
  choose: 69.26% (1129 questions)
  compare: 46.01% (589 questions)
  logical: 71.05% (1803 questions)
  query: 38.28% (6805 questions)
  verify: 76.29% (2252 questions)
Accuracy / semantic type:
  attr: 58.79% (5186 questions)
  cat: 40.30% (1149 questions)
  global: 56.69% (157 questions)
  obj: 81.36% (778 questions)
  rel: 45.65% (5308 questions)
Accuracy / steps number:
  1: 64.98% (237 questions)
  2: 47.58% (6395 questions)
  3: 54.78% (4266 questions)
  4: 62.80% (793 questions)
  5: 68.86% (822 questions)
  6: 87.80% (41 questions)
  7: 95.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 25.83% (151 questions)
  4: 48.25% (630 questions)
  5: 39.38% (1290 questions)
  6: 50.00% (2074 questions)
  7: 52.19% (1642 questions)
  8: 55.70% (1185 questions)
  9: 59.80% (1281 questions)
  10: 57.49% (1249 questions)
  11: 55.23% (994 questions)
  12: 55.96% (638 questions)
  13: 56.71% (462 questions)
  14: 59.42% (345 questions)
  15: 62.45% (237 questions)
  16: 66.67% (117 questions)
  17: 54.26% (94 questions)
  18: 59.21% (76 questions)
  19: 79.07% (43 questions)
  20: 53.12% (32 questions)
  21: 52.63% (19 questions)
  22: 58.33% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 19:46:06
Epoch: 5, Loss: 1.0821526790137108, Lr: 0.0001
Elapsed time: 5034, Speed(s/batch): 0.2997583388487309

Binary: 72.18%
Open: 38.87%
Accuracy: 54.16%
Distribution: 2.16 (lower is better)
Accuracy / structural type:
  choose: 72.54% (1129 questions)
  compare: 60.95% (589 questions)
  logical: 70.66% (1803 questions)
  query: 38.87% (6805 questions)
  verify: 76.15% (2252 questions)
Accuracy / semantic type:
  attr: 61.16% (5186 questions)
  cat: 44.21% (1149 questions)
  global: 57.96% (157 questions)
  obj: 82.52% (778 questions)
  rel: 45.20% (5308 questions)
Accuracy / steps number:
  1: 71.31% (237 questions)
  2: 47.82% (6395 questions)
  3: 57.59% (4266 questions)
  4: 60.40% (793 questions)
  5: 72.02% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 27.81% (151 questions)
  4: 47.30% (630 questions)
  5: 39.38% (1290 questions)
  6: 50.29% (2074 questions)
  7: 53.17% (1642 questions)
  8: 56.20% (1185 questions)
  9: 61.83% (1281 questions)
  10: 59.01% (1249 questions)
  11: 57.44% (994 questions)
  12: 60.97% (638 questions)
  13: 59.31% (462 questions)
  14: 61.16% (345 questions)
  15: 64.56% (237 questions)
  16: 64.10% (117 questions)
  17: 58.51% (94 questions)
  18: 64.47% (76 questions)
  19: 74.42% (43 questions)
  20: 59.38% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 21:10:24
Epoch: 6, Loss: 1.0242240601529657, Lr: 0.0001
Elapsed time: 5283, Speed(s/batch): 0.3145479110343935

Binary: 72.37%
Open: 39.44%
Accuracy: 54.56%
Distribution: 2.30 (lower is better)
Accuracy / structural type:
  choose: 69.97% (1129 questions)
  compare: 64.86% (589 questions)
  logical: 69.94% (1803 questions)
  query: 39.44% (6805 questions)
  verify: 77.49% (2252 questions)
Accuracy / semantic type:
  attr: 60.70% (5186 questions)
  cat: 43.08% (1149 questions)
  global: 54.78% (157 questions)
  obj: 84.32% (778 questions)
  rel: 46.67% (5308 questions)
Accuracy / steps number:
  1: 70.46% (237 questions)
  2: 48.54% (6395 questions)
  3: 58.49% (4266 questions)
  4: 59.02% (793 questions)
  5: 69.34% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.80% (151 questions)
  4: 45.56% (630 questions)
  5: 40.47% (1290 questions)
  6: 51.74% (2074 questions)
  7: 54.57% (1642 questions)
  8: 55.70% (1185 questions)
  9: 59.88% (1281 questions)
  10: 61.81% (1249 questions)
  11: 56.34% (994 questions)
  12: 61.91% (638 questions)
  13: 56.06% (462 questions)
  14: 63.77% (345 questions)
  15: 60.76% (237 questions)
  16: 63.25% (117 questions)
  17: 69.15% (94 questions)
  18: 65.79% (76 questions)
  19: 72.09% (43 questions)
  20: 56.25% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-19 22:38:50
Epoch: 7, Loss: 0.975194708264801, Lr: 0.0001
Elapsed time: 5540, Speed(s/batch): 0.32989268883207523

Binary: 72.91%
Open: 38.75%
Accuracy: 54.43%
Distribution: 1.97 (lower is better)
Accuracy / structural type:
  choose: 72.45% (1129 questions)
  compare: 62.48% (589 questions)
  logical: 70.99% (1803 questions)
  query: 38.75% (6805 questions)
  verify: 77.40% (2252 questions)
Accuracy / semantic type:
  attr: 60.89% (5186 questions)
  cat: 42.91% (1149 questions)
  global: 59.87% (157 questions)
  obj: 84.96% (778 questions)
  rel: 45.97% (5308 questions)
Accuracy / steps number:
  1: 71.73% (237 questions)
  2: 48.18% (6395 questions)
  3: 58.06% (4266 questions)
  4: 62.04% (793 questions)
  5: 69.46% (822 questions)
  6: 78.05% (41 questions)
  7: 95.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 30.46% (151 questions)
  4: 46.51% (630 questions)
  5: 40.47% (1290 questions)
  6: 50.72% (2074 questions)
  7: 54.93% (1642 questions)
  8: 57.13% (1185 questions)
  9: 60.73% (1281 questions)
  10: 59.89% (1249 questions)
  11: 57.34% (994 questions)
  12: 61.13% (638 questions)
  13: 51.73% (462 questions)
  14: 64.06% (345 questions)
  15: 63.29% (237 questions)
  16: 66.67% (117 questions)
  17: 58.51% (94 questions)
  18: 64.47% (76 questions)
  19: 69.77% (43 questions)
  20: 62.50% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-20 00:11:56
Epoch: 8, Loss: 0.9346485017632614, Lr: 0.0001
Elapsed time: 5532, Speed(s/batch): 0.3294190824187974

Binary: 72.63%
Open: 38.85%
Accuracy: 54.36%
Distribution: 2.27 (lower is better)
Accuracy / structural type:
  choose: 69.97% (1129 questions)
  compare: 62.48% (589 questions)
  logical: 71.77% (1803 questions)
  query: 38.85% (6805 questions)
  verify: 77.31% (2252 questions)
Accuracy / semantic type:
  attr: 60.20% (5186 questions)
  cat: 44.73% (1149 questions)
  global: 57.32% (157 questions)
  obj: 84.19% (778 questions)
  rel: 46.27% (5308 questions)
Accuracy / steps number:
  1: 69.20% (237 questions)
  2: 48.71% (6395 questions)
  3: 56.92% (4266 questions)
  4: 61.41% (793 questions)
  5: 71.53% (822 questions)
  6: 82.93% (41 questions)
  7: 90.00% (20 questions)
  8: 66.67% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 47.46% (630 questions)
  5: 41.55% (1290 questions)
  6: 51.25% (2074 questions)
  7: 53.78% (1642 questions)
  8: 55.86% (1185 questions)
  9: 59.48% (1281 questions)
  10: 61.33% (1249 questions)
  11: 56.54% (994 questions)
  12: 61.91% (638 questions)
  13: 56.93% (462 questions)
  14: 58.84% (345 questions)
  15: 62.03% (237 questions)
  16: 63.25% (117 questions)
  17: 62.77% (94 questions)
  18: 65.79% (76 questions)
  19: 60.47% (43 questions)
  20: 56.25% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-20 01:44:43
Epoch: 9, Loss: 0.6903910459351245, Lr: 2e-05
Elapsed time: 5527, Speed(s/batch): 0.32908785358784626

Binary: 75.78%
Open: 41.45%
Accuracy: 57.21%
Distribution: 1.63 (lower is better)
Accuracy / structural type:
  choose: 74.58% (1129 questions)
  compare: 66.89% (589 questions)
  logical: 74.27% (1803 questions)
  query: 41.45% (6805 questions)
  verify: 79.93% (2252 questions)
Accuracy / semantic type:
  attr: 64.02% (5186 questions)
  cat: 46.30% (1149 questions)
  global: 59.87% (157 questions)
  obj: 86.50% (778 questions)
  rel: 48.55% (5308 questions)
Accuracy / steps number:
  1: 72.15% (237 questions)
  2: 51.34% (6395 questions)
  3: 60.34% (4266 questions)
  4: 63.68% (793 questions)
  5: 73.60% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.80% (151 questions)
  4: 48.73% (630 questions)
  5: 44.65% (1290 questions)
  6: 54.44% (2074 questions)
  7: 55.54% (1642 questions)
  8: 60.00% (1185 questions)
  9: 64.09% (1281 questions)
  10: 63.33% (1249 questions)
  11: 59.66% (994 questions)
  12: 64.26% (638 questions)
  13: 59.31% (462 questions)
  14: 61.45% (345 questions)
  15: 64.14% (237 questions)
  16: 64.96% (117 questions)
  17: 62.77% (94 questions)
  18: 68.42% (76 questions)
  19: 76.74% (43 questions)
  20: 53.12% (32 questions)
  21: 73.68% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-20 03:17:26
Epoch: 10, Loss: 0.5974722676920379, Lr: 2e-05
Elapsed time: 5601, Speed(s/batch): 0.3334813243748319

Binary: 75.92%
Open: 41.72%
Accuracy: 57.42%
Distribution: 1.49 (lower is better)
Accuracy / structural type:
  choose: 75.29% (1129 questions)
  compare: 66.38% (589 questions)
  logical: 74.32% (1803 questions)
  query: 41.72% (6805 questions)
  verify: 80.02% (2252 questions)
Accuracy / semantic type:
  attr: 64.02% (5186 questions)
  cat: 45.95% (1149 questions)
  global: 57.32% (157 questions)
  obj: 87.15% (778 questions)
  rel: 49.10% (5308 questions)
Accuracy / steps number:
  1: 75.11% (237 questions)
  2: 51.49% (6395 questions)
  3: 60.45% (4266 questions)
  4: 63.43% (793 questions)
  5: 74.21% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 35.76% (151 questions)
  4: 51.27% (630 questions)
  5: 44.88% (1290 questions)
  6: 54.05% (2074 questions)
  7: 56.52% (1642 questions)
  8: 58.48% (1185 questions)
  9: 63.39% (1281 questions)
  10: 63.73% (1249 questions)
  11: 58.75% (994 questions)
  12: 64.11% (638 questions)
  13: 60.39% (462 questions)
  14: 64.06% (345 questions)
  15: 64.56% (237 questions)
  16: 65.81% (117 questions)
  17: 62.77% (94 questions)
  18: 72.37% (76 questions)
  19: 76.74% (43 questions)
  20: 65.62% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-20 04:51:12
Epoch: 11, Loss: 0.5036989731823418, Lr: 4.000000000000001e-06
Elapsed time: 5452, Speed(s/batch): 0.3246474958766384

Binary: 75.91%
Open: 41.63%
Accuracy: 57.36%
Distribution: 1.54 (lower is better)
Accuracy / structural type:
  choose: 75.02% (1129 questions)
  compare: 67.91% (589 questions)
  logical: 74.32% (1803 questions)
  query: 41.63% (6805 questions)
  verify: 79.71% (2252 questions)
Accuracy / semantic type:
  attr: 64.29% (5186 questions)
  cat: 46.65% (1149 questions)
  global: 58.60% (157 questions)
  obj: 85.86% (778 questions)
  rel: 48.70% (5308 questions)
Accuracy / steps number:
  1: 75.95% (237 questions)
  2: 51.07% (6395 questions)
  3: 60.81% (4266 questions)
  4: 64.06% (793 questions)
  5: 73.97% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 49.68% (630 questions)
  5: 44.42% (1290 questions)
  6: 54.39% (2074 questions)
  7: 57.06% (1642 questions)
  8: 59.16% (1185 questions)
  9: 63.08% (1281 questions)
  10: 63.49% (1249 questions)
  11: 59.36% (994 questions)
  12: 63.79% (638 questions)
  13: 58.87% (462 questions)
  14: 64.06% (345 questions)
  15: 63.71% (237 questions)
  16: 66.67% (117 questions)
  17: 63.83% (94 questions)
  18: 73.68% (76 questions)
  19: 76.74% (43 questions)
  20: 65.62% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)
MIL-VLG commented 4 years ago

The results on GQA is not very stable (at least for the MCAN model). A different random seed may result in up to ~1% overall accuracy in our practice. You can change random seed to have another trial

luogen1996 commented 4 years ago

The results on GQA is not very stable (at least for the MCAN model). A different random seed may result in up to ~1% overall accuracy in our practice. You can change random seed to have another trial

Thank you so much!

mrfarazi commented 4 years ago

@MIL-VLG and @luogen1996 I am getting the accuracy of 56.23 with all default setting compared to 58.10 reported. I wonder what I am doing wrong. I have pasted the log for cross check.

{ BATCH_SIZE        }->64
{ BBOXFEAT_EMB_SIZE }->2048
{ CACHE_PATH        }->./results/cache
{ CKPTS_PATH        }->./ckpts
{ CKPT_EPOCH        }->0
{ CKPT_PATH         }->None
{ CKPT_VERSION      }->4120150
{ DATASET           }->gqa
{ DATA_PATH         }->{'vqa': './data/vqa', 'clevr': './data/clevr', 'gqa': './data/gqa'}
{ DATA_ROOT         }->./data
{ DEVICES           }->[0]
{ DROPOUT_R         }->0.1
{ EVAL_BATCH_SIZE   }->8
{ EVAL_EVERY_EPOCH  }->True
{ FEATS_PATH        }->{'vqa': {'val': './data/vqa/feats/val2014', 'train': './data/vqa/feats/train2014', 'test': './data/vqa/feats/test2015'}, 'clevr': {'val': './data/clevr/feats/val', 'train': './data/clevr/feats/train', 'test': './data/clevr/feats/test'}, 'gqa': {'default-frcn': './data/gqa/feats/gqa-frcn', 'default-grid': './data/gqa/feats/gqa-grid'}}
{ FEAT_SIZE         }->{'vqa': {'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}, 'clevr': {'GRID_FEAT_SIZE': (196, 1024)}, 'gqa': {'GRID_FEAT_SIZE': (49, 2048), 'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}}
{ FF_SIZE           }->4096
{ FLAT_GLIMPSES     }->1
{ FLAT_MLP_SIZE     }->512
{ FLAT_OUT_SIZE     }->2048
{ GPU               }->7
{ GRAD_ACCU_STEPS   }->4
{ GRAD_NORM_CLIP    }->-1
{ HIDDEN_SIZE       }->1024
{ LAYER             }->6
{ LOG_PATH          }->./results/log
{ LOSS_FUNC         }->ce
{ LOSS_FUNC_NAME_DICT }->{'bce': 'BCEWithLogitsLoss', 'kld': 'KLDivLoss', 'ce': 'CrossEntropyLoss', 'mse': 'MSELoss'}
{ LOSS_FUNC_NONLINEAR }->{'bce': [None, None], 'kld': ['log_softmax', None], 'ce': [None, 'flat'], 'mse': [None, None]}
{ LOSS_REDUCTION    }->sum
{ LR_BASE           }->5e-05
{ LR_DECAY_LIST     }->[8, 10]
{ LR_DECAY_R        }->0.2
{ MAX_EPOCH         }->11
{ MODEL             }->mcan_large
{ MODEL_USE         }->mcan
{ MULTI_HEAD        }->8
{ NUM_WORKERS       }->8
{ N_GPU             }->1
{ OPT               }->Adam
{ OPT_PARAMS        }->{'betas': (0.9, 0.98), 'weight_decay': 0, 'eps': 1e-09, 'amsgrad': False}
{ PIN_MEM           }->True
{ PRED_PATH         }->./results/pred
{ RAW_PATH          }->{'vqa': {'val-anno': './data/vqa/raw/v2_mscoco_val2014_annotations.json', 'val': './data/vqa/raw/v2_OpenEnded_mscoco_val2014_questions.json', 'test': './data/vqa/raw/v2_OpenEnded_mscoco_test2015_questions.json', 'vg': './data/vqa/raw/VG_questions.json', 'train-anno': './data/vqa/raw/v2_mscoco_train2014_annotations.json', 'train': './data/vqa/raw/v2_OpenEnded_mscoco_train2014_questions.json', 'vg-anno': './data/vqa/raw/VG_annotations.json'}, 'clevr': {'val': './data/clevr/raw/questions/CLEVR_val_questions.json', 'train': './data/clevr/raw/questions/CLEVR_train_questions.json', 'test': './data/clevr/raw/questions/CLEVR_test_questions.json'}, 'gqa': {'val': './data/gqa/raw/questions1.2/val_balanced_questions.json', 'val_all': './data/gqa/raw/questions1.2/val_all_questions.json', 'test': './data/gqa/raw/questions1.2/submission_all_questions.json', 'testdev_all': './data/gqa/raw/questions1.2/testdev_all_questions.json', 'val_choices': './data/gqa/raw/eval/val_choices.json', 'train_choices': './data/gqa/raw/eval/train_choices', 'train': './data/gqa/raw/questions1.2/train_balanced_questions.json', 'testdev': './data/gqa/raw/questions1.2/testdev_balanced_questions.json'}}
{ RESULT_PATH       }->./results/result_test
{ RESUME            }->False
{ RUN_MODE          }->train
{ SEED              }->4120150
{ SPLIT             }->{'val': 'testdev', 'train': 'train+val', 'test': 'test'}
{ SPLITS            }->{'vqa': {'val': 'val', 'train': '', 'test': 'test'}, 'clevr': {'val': 'val', 'train': '', 'test': 'test'}, 'gqa': {'val': 'testdev', 'train': 'train+val', 'test': 'test'}}
{ SUB_BATCH_SIZE    }->16
{ TASK_LOSS_CHECK   }->{'vqa': ['bce', 'kld'], 'clevr': ['ce'], 'gqa': ['ce']}
{ TEST_SAVE_PRED    }->False
{ TRAIN_SPLIT       }->train+val
{ USE_AUX_FEAT      }->True
{ USE_BBOX_FEAT     }->True
{ USE_GLOVE         }->True
{ VERBOSE           }->True
{ VERSION           }->default_grid+bb+frcn
{ WARMUP_EPOCH      }->2
{ WORD_EMBED_SIZE   }->300
=====================================
nowTime: 2020-01-15 17:17:00
Epoch: 1, Loss: 1.68896448995679, Lr: 3.3333333333333335e-05
Elapsed time: 12386, Speed(s/batch): 0.7374428591009378

Binary: 56.75%
Open: 34.61%
Accuracy: 44.77%
Distribution: 4.05 (lower is better)
Accuracy / structural type:
  choose: 60.76% (1129 questions)
  compare: 57.05% (589 questions)
  logical: 57.35% (1803 questions)
  query: 34.61% (6805 questions)
  verify: 54.17% (2252 questions)
Accuracy / semantic type:
  attr: 48.98% (5186 questions)
  cat: 40.38% (1149 questions)
  global: 49.68% (157 questions)
  obj: 68.12% (778 questions)
  rel: 38.04% (5308 questions)
Accuracy / steps number:
  1: 60.34% (237 questions)
  2: 40.72% (6395 questions)
  3: 45.99% (4266 questions)
  4: 47.04% (793 questions)
  5: 60.46% (822 questions)
  6: 68.29% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.18% (151 questions)
  4: 41.75% (630 questions)
  5: 35.04% (1290 questions)
  6: 40.98% (2074 questions)
  7: 42.02% (1642 questions)
  8: 46.33% (1185 questions)
  9: 49.65% (1281 questions)
  10: 50.92% (1249 questions)
  11: 46.98% (994 questions)
  12: 52.04% (638 questions)
  13: 49.78% (462 questions)
  14: 47.25% (345 questions)
  15: 51.90% (237 questions)
  16: 51.28% (117 questions)
  17: 45.74% (94 questions)
  18: 53.95% (76 questions)
  19: 58.14% (43 questions)
  20: 56.25% (32 questions)
  21: 42.11% (19 questions)
  22: 50.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-15 20:44:31
Epoch: 2, Loss: 1.3852027663334634, Lr: 5e-05
Elapsed time: 12375, Speed(s/batch): 0.7367934128107642

Binary: 63.78%
Open: 37.11%
Accuracy: 49.35%
Distribution: 2.37 (lower is better)
Accuracy / structural type:
  choose: 66.96% (1129 questions)
  compare: 55.52% (589 questions)
  logical: 58.62% (1803 questions)
  query: 37.11% (6805 questions)
  verify: 68.47% (2252 questions)
Accuracy / semantic type:
  attr: 53.49% (5186 questions)
  cat: 41.17% (1149 questions)
  global: 54.78% (157 questions)
  obj: 73.01% (778 questions)
  rel: 43.44% (5308 questions)
Accuracy / steps number:
  1: 62.87% (237 questions)
  2: 45.08% (6395 questions)
  3: 52.98% (4266 questions)
  4: 47.16% (793 questions)
  5: 59.25% (822 questions)
  6: 73.17% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 44.92% (630 questions)
  5: 36.90% (1290 questions)
  6: 44.79% (2074 questions)
  7: 47.56% (1642 questions)
  8: 49.96% (1185 questions)
  9: 54.72% (1281 questions)
  10: 56.37% (1249 questions)
  11: 53.52% (994 questions)
  12: 56.27% (638 questions)
  13: 53.03% (462 questions)
  14: 55.36% (345 questions)
  15: 62.87% (237 questions)
  16: 56.41% (117 questions)
  17: 50.00% (94 questions)
  18: 59.21% (76 questions)
  19: 67.44% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 41.67% (12 questions)
  23: 0.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 00:11:51
Epoch: 3, Loss: 1.2009483448863143, Lr: 5e-05
Elapsed time: 12355, Speed(s/batch): 0.7356053381398849

Binary: 68.98%
Open: 36.49%
Accuracy: 51.40%
Distribution: 2.03 (lower is better)
Accuracy / structural type:
  choose: 69.71% (1129 questions)
  compare: 53.82% (589 questions)
  logical: 66.67% (1803 questions)
  query: 36.49% (6805 questions)
  verify: 74.42% (2252 questions)
Accuracy / semantic type:
  attr: 56.75% (5186 questions)
  cat: 42.30% (1149 questions)
  global: 52.23% (157 questions)
  obj: 79.05% (778 questions)
  rel: 44.07% (5308 questions)
Accuracy / steps number:
  1: 65.40% (237 questions)
  2: 46.21% (6395 questions)
  3: 54.27% (4266 questions)
  4: 56.87% (793 questions)
  5: 64.84% (822 questions)
  6: 78.05% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.79% (151 questions)
  4: 46.67% (630 questions)
  5: 38.60% (1290 questions)
  6: 47.97% (2074 questions)
  7: 49.57% (1642 questions)
  8: 54.85% (1185 questions)
  9: 56.75% (1281 questions)
  10: 56.53% (1249 questions)
  11: 53.62% (994 questions)
  12: 55.96% (638 questions)
  13: 55.84% (462 questions)
  14: 57.39% (345 questions)
  15: 64.98% (237 questions)
  16: 60.68% (117 questions)
  17: 51.06% (94 questions)
  18: 56.58% (76 questions)
  19: 76.74% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 58.33% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 03:38:51
Epoch: 4, Loss: 1.0470566586765542, Lr: 5e-05
Elapsed time: 12343, Speed(s/batch): 0.7348805314644543

Binary: 70.38%
Open: 38.28%
Accuracy: 53.01%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 72.63% (1129 questions)
  compare: 59.25% (589 questions)
  logical: 67.00% (1803 questions)
  query: 38.28% (6805 questions)
  verify: 74.87% (2252 questions)
Accuracy / semantic type:
  attr: 58.54% (5186 questions)
  cat: 43.52% (1149 questions)
  global: 53.50% (157 questions)
  obj: 80.33% (778 questions)
  rel: 45.65% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 48.27% (6395 questions)
  3: 55.49% (4266 questions)
  4: 58.26% (793 questions)
  5: 64.84% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 28.48% (151 questions)
  4: 48.89% (630 questions)
  5: 39.69% (1290 questions)
  6: 49.28% (2074 questions)
  7: 52.50% (1642 questions)
  8: 55.27% (1185 questions)
  9: 57.30% (1281 questions)
  10: 59.81% (1249 questions)
  11: 54.73% (994 questions)
  12: 58.31% (638 questions)
  13: 57.36% (462 questions)
  14: 61.16% (345 questions)
  15: 64.14% (237 questions)
  16: 59.83% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 07:05:39
Epoch: 5, Loss: 0.9381465803343356, Lr: 5e-05
Elapsed time: 12374, Speed(s/batch): 0.7367752227478863

Binary: 71.40%
Open: 37.77%
Accuracy: 53.20%
Distribution: 2.29 (lower is better)
Accuracy / structural type:
  choose: 70.50% (1129 questions)
  compare: 61.29% (589 questions)
  logical: 68.66% (1803 questions)
  query: 37.77% (6805 questions)
  verify: 76.69% (2252 questions)
Accuracy / semantic type:
  attr: 60.05% (5186 questions)
  cat: 40.91% (1149 questions)
  global: 52.87% (157 questions)
  obj: 81.88% (778 questions)
  rel: 44.99% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.18% (6395 questions)
  3: 56.89% (4266 questions)
  4: 60.40% (793 questions)
  5: 66.79% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 27.81% (151 questions)
  4: 46.98% (630 questions)
  5: 40.54% (1290 questions)
  6: 49.08% (2074 questions)
  7: 51.83% (1642 questions)
  8: 55.36% (1185 questions)
  9: 58.86% (1281 questions)
  10: 60.45% (1249 questions)
  11: 56.74% (994 questions)
  12: 58.62% (638 questions)
  13: 58.01% (462 questions)
  14: 59.42% (345 questions)
  15: 61.18% (237 questions)
  16: 64.96% (117 questions)
  17: 47.87% (94 questions)
  18: 63.16% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 52.63% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 10:32:59
Epoch: 6, Loss: 0.8608418081554507, Lr: 5e-05
Elapsed time: 12379, Speed(s/batch): 0.7370702962161075

Binary: 71.97%
Open: 38.62%
Accuracy: 53.93%
Distribution: 2.63 (lower is better)
Accuracy / structural type:
  choose: 72.10% (1129 questions)
  compare: 54.33% (589 questions)
  logical: 71.60% (1803 questions)
  query: 38.62% (6805 questions)
  verify: 76.82% (2252 questions)
Accuracy / semantic type:
  attr: 61.40% (5186 questions)
  cat: 41.69% (1149 questions)
  global: 54.78% (157 questions)
  obj: 81.23% (778 questions)
  rel: 45.25% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.79% (6395 questions)
  3: 57.06% (4266 questions)
  4: 62.80% (793 questions)
  5: 69.95% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 49.21% (630 questions)
  5: 40.16% (1290 questions)
  6: 49.57% (2074 questions)
  7: 53.71% (1642 questions)
  8: 56.88% (1185 questions)
  9: 60.66% (1281 questions)
  10: 59.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 59.87% (638 questions)
  13: 58.66% (462 questions)
  14: 59.71% (345 questions)
  15: 64.56% (237 questions)
  16: 64.10% (117 questions)
  17: 51.06% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 43.75% (32 questions)
  21: 68.42% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 14:00:23
Epoch: 7, Loss: 0.7950429820463256, Lr: 5e-05
Elapsed time: 12384, Speed(s/batch): 0.7373725269828192

Binary: 72.20%
Open: 37.87%
Accuracy: 53.63%
Distribution: 2.02 (lower is better)
Accuracy / structural type:
  choose: 73.25% (1129 questions)
  compare: 61.63% (589 questions)
  logical: 70.55% (1803 questions)
  query: 37.87% (6805 questions)
  verify: 75.75% (2252 questions)
Accuracy / semantic type:
  attr: 61.11% (5186 questions)
  cat: 41.95% (1149 questions)
  global: 55.41% (157 questions)
  obj: 82.65% (778 questions)
  rel: 44.54% (5308 questions)
Accuracy / steps number:
  1: 67.51% (237 questions)
  2: 47.35% (6395 questions)
  3: 57.22% (4266 questions)
  4: 60.91% (793 questions)
  5: 69.95% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 26.49% (151 questions)
  4: 47.78% (630 questions)
  5: 39.46% (1290 questions)
  6: 50.39% (2074 questions)
  7: 52.68% (1642 questions)
  8: 55.02% (1185 questions)
  9: 59.48% (1281 questions)
  10: 61.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 58.62% (638 questions)
  13: 58.23% (462 questions)
  14: 60.29% (345 questions)
  15: 65.40% (237 questions)
  16: 61.54% (117 questions)
  17: 61.70% (94 questions)
  18: 64.47% (76 questions)
  19: 72.09% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 17:27:54
Epoch: 8, Loss: 0.7412050358980075, Lr: 5e-05
Elapsed time: 12385, Speed(s/batch): 0.7374002362296933

Binary: 72.20%
Open: 37.63%
Accuracy: 53.50%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 71.30% (1129 questions)
  compare: 60.95% (589 questions)
  logical: 71.60% (1803 questions)
  query: 37.63% (6805 questions)
  verify: 76.07% (2252 questions)
Accuracy / semantic type:
  attr: 61.47% (5186 questions)
  cat: 41.43% (1149 questions)
  global: 53.50% (157 questions)
  obj: 83.16% (778 questions)
  rel: 43.97% (5308 questions)
Accuracy / steps number:
  1: 68.35% (237 questions)
  2: 47.33% (6395 questions)
  3: 56.54% (4266 questions)
  4: 62.17% (793 questions)
  5: 70.32% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.84% (151 questions)
  4: 46.83% (630 questions)
  5: 39.30% (1290 questions)
  6: 51.30% (2074 questions)
  7: 53.05% (1642 questions)
  8: 56.20% (1185 questions)
  9: 59.88% (1281 questions)
  10: 60.61% (1249 questions)
  11: 52.92% (994 questions)
  12: 60.19% (638 questions)
  13: 56.49% (462 questions)
  14: 60.00% (345 questions)
  15: 61.60% (237 questions)
  16: 63.25% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 20:55:23
Epoch: 9, Loss: 0.47916676114195417, Lr: 1e-05
Elapsed time: 12368, Speed(s/batch): 0.7364252933464496

Binary: 75.11%
Open: 39.99%
Accuracy: 56.11%
Distribution: 1.91 (lower is better)
Accuracy / structural type:
  choose: 74.93% (1129 questions)
  compare: 61.97% (589 questions)
  logical: 74.82% (1803 questions)
  query: 39.99% (6805 questions)
  verify: 78.86% (2252 questions)
Accuracy / semantic type:
  attr: 63.67% (5186 questions)
  cat: 42.82% (1149 questions)
  global: 54.14% (157 questions)
  obj: 86.25% (778 questions)
  rel: 47.23% (5308 questions)
Accuracy / steps number:
  1: 70.04% (237 questions)
  2: 49.54% (6395 questions)
  3: 59.70% (4266 questions)
  4: 65.07% (793 questions)
  5: 73.36% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 50.95% (630 questions)
  5: 42.25% (1290 questions)
  6: 53.81% (2074 questions)
  7: 55.60% (1642 questions)
  8: 57.81% (1185 questions)
  9: 63.23% (1281 questions)
  10: 60.85% (1249 questions)
  11: 56.64% (994 questions)
  12: 61.60% (638 questions)
  13: 58.87% (462 questions)
  14: 64.64% (345 questions)
  15: 62.87% (237 questions)
  16: 64.96% (117 questions)
  17: 62.77% (94 questions)
  18: 64.47% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 00:22:37
Epoch: 10, Loss: 0.37404145107083603, Lr: 1e-05
Elapsed time: 12371, Speed(s/batch): 0.7365669963274095

Binary: 75.06%
Open: 39.59%
Accuracy: 55.87%
Distribution: 1.95 (lower is better)
Accuracy / structural type:
  choose: 76.79% (1129 questions)
  compare: 62.65% (589 questions)
  logical: 73.93% (1803 questions)
  query: 39.59% (6805 questions)
  verify: 78.33% (2252 questions)
Accuracy / semantic type:
  attr: 64.13% (5186 questions)
  cat: 43.08% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.83% (778 questions)
  rel: 46.38% (5308 questions)
Accuracy / steps number:
  1: 67.09% (237 questions)
  2: 49.62% (6395 questions)
  3: 59.28% (4266 questions)
  4: 65.07% (793 questions)
  5: 71.90% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 49.84% (630 questions)
  5: 42.40% (1290 questions)
  6: 52.75% (2074 questions)
  7: 55.97% (1642 questions)
  8: 57.47% (1185 questions)
  9: 60.81% (1281 questions)
  10: 62.77% (1249 questions)
  11: 56.54% (994 questions)
  12: 60.50% (638 questions)
  13: 59.52% (462 questions)
  14: 65.80% (345 questions)
  15: 62.87% (237 questions)
  16: 64.10% (117 questions)
  17: 64.89% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 50.00% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 03:49:53
Epoch: 11, Loss: 0.27490622613804777, Lr: 2.0000000000000003e-06
Elapsed time: 12362, Speed(s/batch): 0.7360429190743336

Binary: 75.13%
Open: 40.21%
Accuracy: 56.23%
Distribution: 1.93 (lower is better)
Accuracy / structural type:
  choose: 77.06% (1129 questions)
  compare: 62.48% (589 questions)
  logical: 73.60% (1803 questions)
  query: 40.21% (6805 questions)
  verify: 78.69% (2252 questions)
Accuracy / semantic type:
  attr: 64.38% (5186 questions)
  cat: 43.34% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.58% (778 questions)
  rel: 46.99% (5308 questions)
Accuracy / steps number:
  1: 70.46% (237 questions)
  2: 49.84% (6395 questions)
  3: 59.94% (4266 questions)
  4: 66.20% (793 questions)
  5: 70.56% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.13% (151 questions)
  4: 50.32% (630 questions)
  5: 43.33% (1290 questions)
  6: 52.56% (2074 questions)
  7: 56.70% (1642 questions)
  8: 57.81% (1185 questions)
  9: 62.61% (1281 questions)
  10: 62.21% (1249 questions)
  11: 56.64% (994 questions)
  12: 60.97% (638 questions)
  13: 57.58% (462 questions)
  14: 67.25% (345 questions)
  15: 64.98% (237 questions)
  16: 66.67% (117 questions)
  17: 59.57% (94 questions)
  18: 68.42% (76 questions)
  19: 76.74% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)
luogen1996 commented 4 years ago

@MIL-VLG and @luogen1996 I am getting the accuracy of 56.23 with all default setting compared to 58.10 reported. I wonder what I am doing wrong. I have pasted the log for cross check.

{ BATCH_SIZE        }->64
{ BBOXFEAT_EMB_SIZE }->2048
{ CACHE_PATH        }->./results/cache
{ CKPTS_PATH        }->./ckpts
{ CKPT_EPOCH        }->0
{ CKPT_PATH         }->None
{ CKPT_VERSION      }->4120150
{ DATASET           }->gqa
{ DATA_PATH         }->{'vqa': './data/vqa', 'clevr': './data/clevr', 'gqa': './data/gqa'}
{ DATA_ROOT         }->./data
{ DEVICES           }->[0]
{ DROPOUT_R         }->0.1
{ EVAL_BATCH_SIZE   }->8
{ EVAL_EVERY_EPOCH  }->True
{ FEATS_PATH        }->{'vqa': {'val': './data/vqa/feats/val2014', 'train': './data/vqa/feats/train2014', 'test': './data/vqa/feats/test2015'}, 'clevr': {'val': './data/clevr/feats/val', 'train': './data/clevr/feats/train', 'test': './data/clevr/feats/test'}, 'gqa': {'default-frcn': './data/gqa/feats/gqa-frcn', 'default-grid': './data/gqa/feats/gqa-grid'}}
{ FEAT_SIZE         }->{'vqa': {'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}, 'clevr': {'GRID_FEAT_SIZE': (196, 1024)}, 'gqa': {'GRID_FEAT_SIZE': (49, 2048), 'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}}
{ FF_SIZE           }->4096
{ FLAT_GLIMPSES     }->1
{ FLAT_MLP_SIZE     }->512
{ FLAT_OUT_SIZE     }->2048
{ GPU               }->7
{ GRAD_ACCU_STEPS   }->4
{ GRAD_NORM_CLIP    }->-1
{ HIDDEN_SIZE       }->1024
{ LAYER             }->6
{ LOG_PATH          }->./results/log
{ LOSS_FUNC         }->ce
{ LOSS_FUNC_NAME_DICT }->{'bce': 'BCEWithLogitsLoss', 'kld': 'KLDivLoss', 'ce': 'CrossEntropyLoss', 'mse': 'MSELoss'}
{ LOSS_FUNC_NONLINEAR }->{'bce': [None, None], 'kld': ['log_softmax', None], 'ce': [None, 'flat'], 'mse': [None, None]}
{ LOSS_REDUCTION    }->sum
{ LR_BASE           }->5e-05
{ LR_DECAY_LIST     }->[8, 10]
{ LR_DECAY_R        }->0.2
{ MAX_EPOCH         }->11
{ MODEL             }->mcan_large
{ MODEL_USE         }->mcan
{ MULTI_HEAD        }->8
{ NUM_WORKERS       }->8
{ N_GPU             }->1
{ OPT               }->Adam
{ OPT_PARAMS        }->{'betas': (0.9, 0.98), 'weight_decay': 0, 'eps': 1e-09, 'amsgrad': False}
{ PIN_MEM           }->True
{ PRED_PATH         }->./results/pred
{ RAW_PATH          }->{'vqa': {'val-anno': './data/vqa/raw/v2_mscoco_val2014_annotations.json', 'val': './data/vqa/raw/v2_OpenEnded_mscoco_val2014_questions.json', 'test': './data/vqa/raw/v2_OpenEnded_mscoco_test2015_questions.json', 'vg': './data/vqa/raw/VG_questions.json', 'train-anno': './data/vqa/raw/v2_mscoco_train2014_annotations.json', 'train': './data/vqa/raw/v2_OpenEnded_mscoco_train2014_questions.json', 'vg-anno': './data/vqa/raw/VG_annotations.json'}, 'clevr': {'val': './data/clevr/raw/questions/CLEVR_val_questions.json', 'train': './data/clevr/raw/questions/CLEVR_train_questions.json', 'test': './data/clevr/raw/questions/CLEVR_test_questions.json'}, 'gqa': {'val': './data/gqa/raw/questions1.2/val_balanced_questions.json', 'val_all': './data/gqa/raw/questions1.2/val_all_questions.json', 'test': './data/gqa/raw/questions1.2/submission_all_questions.json', 'testdev_all': './data/gqa/raw/questions1.2/testdev_all_questions.json', 'val_choices': './data/gqa/raw/eval/val_choices.json', 'train_choices': './data/gqa/raw/eval/train_choices', 'train': './data/gqa/raw/questions1.2/train_balanced_questions.json', 'testdev': './data/gqa/raw/questions1.2/testdev_balanced_questions.json'}}
{ RESULT_PATH       }->./results/result_test
{ RESUME            }->False
{ RUN_MODE          }->train
{ SEED              }->4120150
{ SPLIT             }->{'val': 'testdev', 'train': 'train+val', 'test': 'test'}
{ SPLITS            }->{'vqa': {'val': 'val', 'train': '', 'test': 'test'}, 'clevr': {'val': 'val', 'train': '', 'test': 'test'}, 'gqa': {'val': 'testdev', 'train': 'train+val', 'test': 'test'}}
{ SUB_BATCH_SIZE    }->16
{ TASK_LOSS_CHECK   }->{'vqa': ['bce', 'kld'], 'clevr': ['ce'], 'gqa': ['ce']}
{ TEST_SAVE_PRED    }->False
{ TRAIN_SPLIT       }->train+val
{ USE_AUX_FEAT      }->True
{ USE_BBOX_FEAT     }->True
{ USE_GLOVE         }->True
{ VERBOSE           }->True
{ VERSION           }->default_grid+bb+frcn
{ WARMUP_EPOCH      }->2
{ WORD_EMBED_SIZE   }->300
=====================================
nowTime: 2020-01-15 17:17:00
Epoch: 1, Loss: 1.68896448995679, Lr: 3.3333333333333335e-05
Elapsed time: 12386, Speed(s/batch): 0.7374428591009378

Binary: 56.75%
Open: 34.61%
Accuracy: 44.77%
Distribution: 4.05 (lower is better)
Accuracy / structural type:
  choose: 60.76% (1129 questions)
  compare: 57.05% (589 questions)
  logical: 57.35% (1803 questions)
  query: 34.61% (6805 questions)
  verify: 54.17% (2252 questions)
Accuracy / semantic type:
  attr: 48.98% (5186 questions)
  cat: 40.38% (1149 questions)
  global: 49.68% (157 questions)
  obj: 68.12% (778 questions)
  rel: 38.04% (5308 questions)
Accuracy / steps number:
  1: 60.34% (237 questions)
  2: 40.72% (6395 questions)
  3: 45.99% (4266 questions)
  4: 47.04% (793 questions)
  5: 60.46% (822 questions)
  6: 68.29% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.18% (151 questions)
  4: 41.75% (630 questions)
  5: 35.04% (1290 questions)
  6: 40.98% (2074 questions)
  7: 42.02% (1642 questions)
  8: 46.33% (1185 questions)
  9: 49.65% (1281 questions)
  10: 50.92% (1249 questions)
  11: 46.98% (994 questions)
  12: 52.04% (638 questions)
  13: 49.78% (462 questions)
  14: 47.25% (345 questions)
  15: 51.90% (237 questions)
  16: 51.28% (117 questions)
  17: 45.74% (94 questions)
  18: 53.95% (76 questions)
  19: 58.14% (43 questions)
  20: 56.25% (32 questions)
  21: 42.11% (19 questions)
  22: 50.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-15 20:44:31
Epoch: 2, Loss: 1.3852027663334634, Lr: 5e-05
Elapsed time: 12375, Speed(s/batch): 0.7367934128107642

Binary: 63.78%
Open: 37.11%
Accuracy: 49.35%
Distribution: 2.37 (lower is better)
Accuracy / structural type:
  choose: 66.96% (1129 questions)
  compare: 55.52% (589 questions)
  logical: 58.62% (1803 questions)
  query: 37.11% (6805 questions)
  verify: 68.47% (2252 questions)
Accuracy / semantic type:
  attr: 53.49% (5186 questions)
  cat: 41.17% (1149 questions)
  global: 54.78% (157 questions)
  obj: 73.01% (778 questions)
  rel: 43.44% (5308 questions)
Accuracy / steps number:
  1: 62.87% (237 questions)
  2: 45.08% (6395 questions)
  3: 52.98% (4266 questions)
  4: 47.16% (793 questions)
  5: 59.25% (822 questions)
  6: 73.17% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 44.92% (630 questions)
  5: 36.90% (1290 questions)
  6: 44.79% (2074 questions)
  7: 47.56% (1642 questions)
  8: 49.96% (1185 questions)
  9: 54.72% (1281 questions)
  10: 56.37% (1249 questions)
  11: 53.52% (994 questions)
  12: 56.27% (638 questions)
  13: 53.03% (462 questions)
  14: 55.36% (345 questions)
  15: 62.87% (237 questions)
  16: 56.41% (117 questions)
  17: 50.00% (94 questions)
  18: 59.21% (76 questions)
  19: 67.44% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 41.67% (12 questions)
  23: 0.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 00:11:51
Epoch: 3, Loss: 1.2009483448863143, Lr: 5e-05
Elapsed time: 12355, Speed(s/batch): 0.7356053381398849

Binary: 68.98%
Open: 36.49%
Accuracy: 51.40%
Distribution: 2.03 (lower is better)
Accuracy / structural type:
  choose: 69.71% (1129 questions)
  compare: 53.82% (589 questions)
  logical: 66.67% (1803 questions)
  query: 36.49% (6805 questions)
  verify: 74.42% (2252 questions)
Accuracy / semantic type:
  attr: 56.75% (5186 questions)
  cat: 42.30% (1149 questions)
  global: 52.23% (157 questions)
  obj: 79.05% (778 questions)
  rel: 44.07% (5308 questions)
Accuracy / steps number:
  1: 65.40% (237 questions)
  2: 46.21% (6395 questions)
  3: 54.27% (4266 questions)
  4: 56.87% (793 questions)
  5: 64.84% (822 questions)
  6: 78.05% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.79% (151 questions)
  4: 46.67% (630 questions)
  5: 38.60% (1290 questions)
  6: 47.97% (2074 questions)
  7: 49.57% (1642 questions)
  8: 54.85% (1185 questions)
  9: 56.75% (1281 questions)
  10: 56.53% (1249 questions)
  11: 53.62% (994 questions)
  12: 55.96% (638 questions)
  13: 55.84% (462 questions)
  14: 57.39% (345 questions)
  15: 64.98% (237 questions)
  16: 60.68% (117 questions)
  17: 51.06% (94 questions)
  18: 56.58% (76 questions)
  19: 76.74% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 58.33% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 03:38:51
Epoch: 4, Loss: 1.0470566586765542, Lr: 5e-05
Elapsed time: 12343, Speed(s/batch): 0.7348805314644543

Binary: 70.38%
Open: 38.28%
Accuracy: 53.01%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 72.63% (1129 questions)
  compare: 59.25% (589 questions)
  logical: 67.00% (1803 questions)
  query: 38.28% (6805 questions)
  verify: 74.87% (2252 questions)
Accuracy / semantic type:
  attr: 58.54% (5186 questions)
  cat: 43.52% (1149 questions)
  global: 53.50% (157 questions)
  obj: 80.33% (778 questions)
  rel: 45.65% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 48.27% (6395 questions)
  3: 55.49% (4266 questions)
  4: 58.26% (793 questions)
  5: 64.84% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 28.48% (151 questions)
  4: 48.89% (630 questions)
  5: 39.69% (1290 questions)
  6: 49.28% (2074 questions)
  7: 52.50% (1642 questions)
  8: 55.27% (1185 questions)
  9: 57.30% (1281 questions)
  10: 59.81% (1249 questions)
  11: 54.73% (994 questions)
  12: 58.31% (638 questions)
  13: 57.36% (462 questions)
  14: 61.16% (345 questions)
  15: 64.14% (237 questions)
  16: 59.83% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 07:05:39
Epoch: 5, Loss: 0.9381465803343356, Lr: 5e-05
Elapsed time: 12374, Speed(s/batch): 0.7367752227478863

Binary: 71.40%
Open: 37.77%
Accuracy: 53.20%
Distribution: 2.29 (lower is better)
Accuracy / structural type:
  choose: 70.50% (1129 questions)
  compare: 61.29% (589 questions)
  logical: 68.66% (1803 questions)
  query: 37.77% (6805 questions)
  verify: 76.69% (2252 questions)
Accuracy / semantic type:
  attr: 60.05% (5186 questions)
  cat: 40.91% (1149 questions)
  global: 52.87% (157 questions)
  obj: 81.88% (778 questions)
  rel: 44.99% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.18% (6395 questions)
  3: 56.89% (4266 questions)
  4: 60.40% (793 questions)
  5: 66.79% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 27.81% (151 questions)
  4: 46.98% (630 questions)
  5: 40.54% (1290 questions)
  6: 49.08% (2074 questions)
  7: 51.83% (1642 questions)
  8: 55.36% (1185 questions)
  9: 58.86% (1281 questions)
  10: 60.45% (1249 questions)
  11: 56.74% (994 questions)
  12: 58.62% (638 questions)
  13: 58.01% (462 questions)
  14: 59.42% (345 questions)
  15: 61.18% (237 questions)
  16: 64.96% (117 questions)
  17: 47.87% (94 questions)
  18: 63.16% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 52.63% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 10:32:59
Epoch: 6, Loss: 0.8608418081554507, Lr: 5e-05
Elapsed time: 12379, Speed(s/batch): 0.7370702962161075

Binary: 71.97%
Open: 38.62%
Accuracy: 53.93%
Distribution: 2.63 (lower is better)
Accuracy / structural type:
  choose: 72.10% (1129 questions)
  compare: 54.33% (589 questions)
  logical: 71.60% (1803 questions)
  query: 38.62% (6805 questions)
  verify: 76.82% (2252 questions)
Accuracy / semantic type:
  attr: 61.40% (5186 questions)
  cat: 41.69% (1149 questions)
  global: 54.78% (157 questions)
  obj: 81.23% (778 questions)
  rel: 45.25% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.79% (6395 questions)
  3: 57.06% (4266 questions)
  4: 62.80% (793 questions)
  5: 69.95% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 49.21% (630 questions)
  5: 40.16% (1290 questions)
  6: 49.57% (2074 questions)
  7: 53.71% (1642 questions)
  8: 56.88% (1185 questions)
  9: 60.66% (1281 questions)
  10: 59.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 59.87% (638 questions)
  13: 58.66% (462 questions)
  14: 59.71% (345 questions)
  15: 64.56% (237 questions)
  16: 64.10% (117 questions)
  17: 51.06% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 43.75% (32 questions)
  21: 68.42% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 14:00:23
Epoch: 7, Loss: 0.7950429820463256, Lr: 5e-05
Elapsed time: 12384, Speed(s/batch): 0.7373725269828192

Binary: 72.20%
Open: 37.87%
Accuracy: 53.63%
Distribution: 2.02 (lower is better)
Accuracy / structural type:
  choose: 73.25% (1129 questions)
  compare: 61.63% (589 questions)
  logical: 70.55% (1803 questions)
  query: 37.87% (6805 questions)
  verify: 75.75% (2252 questions)
Accuracy / semantic type:
  attr: 61.11% (5186 questions)
  cat: 41.95% (1149 questions)
  global: 55.41% (157 questions)
  obj: 82.65% (778 questions)
  rel: 44.54% (5308 questions)
Accuracy / steps number:
  1: 67.51% (237 questions)
  2: 47.35% (6395 questions)
  3: 57.22% (4266 questions)
  4: 60.91% (793 questions)
  5: 69.95% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 26.49% (151 questions)
  4: 47.78% (630 questions)
  5: 39.46% (1290 questions)
  6: 50.39% (2074 questions)
  7: 52.68% (1642 questions)
  8: 55.02% (1185 questions)
  9: 59.48% (1281 questions)
  10: 61.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 58.62% (638 questions)
  13: 58.23% (462 questions)
  14: 60.29% (345 questions)
  15: 65.40% (237 questions)
  16: 61.54% (117 questions)
  17: 61.70% (94 questions)
  18: 64.47% (76 questions)
  19: 72.09% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 17:27:54
Epoch: 8, Loss: 0.7412050358980075, Lr: 5e-05
Elapsed time: 12385, Speed(s/batch): 0.7374002362296933

Binary: 72.20%
Open: 37.63%
Accuracy: 53.50%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 71.30% (1129 questions)
  compare: 60.95% (589 questions)
  logical: 71.60% (1803 questions)
  query: 37.63% (6805 questions)
  verify: 76.07% (2252 questions)
Accuracy / semantic type:
  attr: 61.47% (5186 questions)
  cat: 41.43% (1149 questions)
  global: 53.50% (157 questions)
  obj: 83.16% (778 questions)
  rel: 43.97% (5308 questions)
Accuracy / steps number:
  1: 68.35% (237 questions)
  2: 47.33% (6395 questions)
  3: 56.54% (4266 questions)
  4: 62.17% (793 questions)
  5: 70.32% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.84% (151 questions)
  4: 46.83% (630 questions)
  5: 39.30% (1290 questions)
  6: 51.30% (2074 questions)
  7: 53.05% (1642 questions)
  8: 56.20% (1185 questions)
  9: 59.88% (1281 questions)
  10: 60.61% (1249 questions)
  11: 52.92% (994 questions)
  12: 60.19% (638 questions)
  13: 56.49% (462 questions)
  14: 60.00% (345 questions)
  15: 61.60% (237 questions)
  16: 63.25% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 20:55:23
Epoch: 9, Loss: 0.47916676114195417, Lr: 1e-05
Elapsed time: 12368, Speed(s/batch): 0.7364252933464496

Binary: 75.11%
Open: 39.99%
Accuracy: 56.11%
Distribution: 1.91 (lower is better)
Accuracy / structural type:
  choose: 74.93% (1129 questions)
  compare: 61.97% (589 questions)
  logical: 74.82% (1803 questions)
  query: 39.99% (6805 questions)
  verify: 78.86% (2252 questions)
Accuracy / semantic type:
  attr: 63.67% (5186 questions)
  cat: 42.82% (1149 questions)
  global: 54.14% (157 questions)
  obj: 86.25% (778 questions)
  rel: 47.23% (5308 questions)
Accuracy / steps number:
  1: 70.04% (237 questions)
  2: 49.54% (6395 questions)
  3: 59.70% (4266 questions)
  4: 65.07% (793 questions)
  5: 73.36% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 50.95% (630 questions)
  5: 42.25% (1290 questions)
  6: 53.81% (2074 questions)
  7: 55.60% (1642 questions)
  8: 57.81% (1185 questions)
  9: 63.23% (1281 questions)
  10: 60.85% (1249 questions)
  11: 56.64% (994 questions)
  12: 61.60% (638 questions)
  13: 58.87% (462 questions)
  14: 64.64% (345 questions)
  15: 62.87% (237 questions)
  16: 64.96% (117 questions)
  17: 62.77% (94 questions)
  18: 64.47% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 00:22:37
Epoch: 10, Loss: 0.37404145107083603, Lr: 1e-05
Elapsed time: 12371, Speed(s/batch): 0.7365669963274095

Binary: 75.06%
Open: 39.59%
Accuracy: 55.87%
Distribution: 1.95 (lower is better)
Accuracy / structural type:
  choose: 76.79% (1129 questions)
  compare: 62.65% (589 questions)
  logical: 73.93% (1803 questions)
  query: 39.59% (6805 questions)
  verify: 78.33% (2252 questions)
Accuracy / semantic type:
  attr: 64.13% (5186 questions)
  cat: 43.08% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.83% (778 questions)
  rel: 46.38% (5308 questions)
Accuracy / steps number:
  1: 67.09% (237 questions)
  2: 49.62% (6395 questions)
  3: 59.28% (4266 questions)
  4: 65.07% (793 questions)
  5: 71.90% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 49.84% (630 questions)
  5: 42.40% (1290 questions)
  6: 52.75% (2074 questions)
  7: 55.97% (1642 questions)
  8: 57.47% (1185 questions)
  9: 60.81% (1281 questions)
  10: 62.77% (1249 questions)
  11: 56.54% (994 questions)
  12: 60.50% (638 questions)
  13: 59.52% (462 questions)
  14: 65.80% (345 questions)
  15: 62.87% (237 questions)
  16: 64.10% (117 questions)
  17: 64.89% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 50.00% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 03:49:53
Epoch: 11, Loss: 0.27490622613804777, Lr: 2.0000000000000003e-06
Elapsed time: 12362, Speed(s/batch): 0.7360429190743336

Binary: 75.13%
Open: 40.21%
Accuracy: 56.23%
Distribution: 1.93 (lower is better)
Accuracy / structural type:
  choose: 77.06% (1129 questions)
  compare: 62.48% (589 questions)
  logical: 73.60% (1803 questions)
  query: 40.21% (6805 questions)
  verify: 78.69% (2252 questions)
Accuracy / semantic type:
  attr: 64.38% (5186 questions)
  cat: 43.34% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.58% (778 questions)
  rel: 46.99% (5308 questions)
Accuracy / steps number:
  1: 70.46% (237 questions)
  2: 49.84% (6395 questions)
  3: 59.94% (4266 questions)
  4: 66.20% (793 questions)
  5: 70.56% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.13% (151 questions)
  4: 50.32% (630 questions)
  5: 43.33% (1290 questions)
  6: 52.56% (2074 questions)
  7: 56.70% (1642 questions)
  8: 57.81% (1185 questions)
  9: 62.61% (1281 questions)
  10: 62.21% (1249 questions)
  11: 56.64% (994 questions)
  12: 60.97% (638 questions)
  13: 57.58% (462 questions)
  14: 67.25% (345 questions)
  15: 64.98% (237 questions)
  16: 66.67% (117 questions)
  17: 59.57% (94 questions)
  18: 68.42% (76 questions)
  19: 76.74% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

I have tried on the mcan_large, and achieved the similiar results as your report (slightly worse than mcan_small). I think it's nothing wrong in the settings while random seeds will impact 0.5% at most in my experiment.

mrfarazi commented 4 years ago

Thanks for your quick reply @luogen1996 , you mentioned that you achieved 58.3 with mcan_small. If I wanted to reproduce that result, what SEED should I set?

luogen1996 commented 4 years ago

Thanks for your quick reply @luogen1996 , you mentioned that you achieved 58.3 with mcan_small. If I wanted to reproduce that result, what SEED should I set?

Actually, i can't achieve 58.3 and my best result is around 57.3~57.4. However, i download the official pretrained model for testing which gets 58.4. Thus, i also want to know the random seed. I'm appreciate if the @MIL-VLG can give some training details.

Amajo0209 commented 1 year ago

@MIL-VLG and @luogen1996 I am getting the accuracy of 56.23 with all default setting compared to 58.10 reported. I wonder what I am doing wrong. I have pasted the log for cross check.

{ BATCH_SIZE        }->64
{ BBOXFEAT_EMB_SIZE }->2048
{ CACHE_PATH        }->./results/cache
{ CKPTS_PATH        }->./ckpts
{ CKPT_EPOCH        }->0
{ CKPT_PATH         }->None
{ CKPT_VERSION      }->4120150
{ DATASET           }->gqa
{ DATA_PATH         }->{'vqa': './data/vqa', 'clevr': './data/clevr', 'gqa': './data/gqa'}
{ DATA_ROOT         }->./data
{ DEVICES           }->[0]
{ DROPOUT_R         }->0.1
{ EVAL_BATCH_SIZE   }->8
{ EVAL_EVERY_EPOCH  }->True
{ FEATS_PATH        }->{'vqa': {'val': './data/vqa/feats/val2014', 'train': './data/vqa/feats/train2014', 'test': './data/vqa/feats/test2015'}, 'clevr': {'val': './data/clevr/feats/val', 'train': './data/clevr/feats/train', 'test': './data/clevr/feats/test'}, 'gqa': {'default-frcn': './data/gqa/feats/gqa-frcn', 'default-grid': './data/gqa/feats/gqa-grid'}}
{ FEAT_SIZE         }->{'vqa': {'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}, 'clevr': {'GRID_FEAT_SIZE': (196, 1024)}, 'gqa': {'GRID_FEAT_SIZE': (49, 2048), 'BBOX_FEAT_SIZE': (100, 5), 'FRCN_FEAT_SIZE': (100, 2048)}}
{ FF_SIZE           }->4096
{ FLAT_GLIMPSES     }->1
{ FLAT_MLP_SIZE     }->512
{ FLAT_OUT_SIZE     }->2048
{ GPU               }->7
{ GRAD_ACCU_STEPS   }->4
{ GRAD_NORM_CLIP    }->-1
{ HIDDEN_SIZE       }->1024
{ LAYER             }->6
{ LOG_PATH          }->./results/log
{ LOSS_FUNC         }->ce
{ LOSS_FUNC_NAME_DICT }->{'bce': 'BCEWithLogitsLoss', 'kld': 'KLDivLoss', 'ce': 'CrossEntropyLoss', 'mse': 'MSELoss'}
{ LOSS_FUNC_NONLINEAR }->{'bce': [None, None], 'kld': ['log_softmax', None], 'ce': [None, 'flat'], 'mse': [None, None]}
{ LOSS_REDUCTION    }->sum
{ LR_BASE           }->5e-05
{ LR_DECAY_LIST     }->[8, 10]
{ LR_DECAY_R        }->0.2
{ MAX_EPOCH         }->11
{ MODEL             }->mcan_large
{ MODEL_USE         }->mcan
{ MULTI_HEAD        }->8
{ NUM_WORKERS       }->8
{ N_GPU             }->1
{ OPT               }->Adam
{ OPT_PARAMS        }->{'betas': (0.9, 0.98), 'weight_decay': 0, 'eps': 1e-09, 'amsgrad': False}
{ PIN_MEM           }->True
{ PRED_PATH         }->./results/pred
{ RAW_PATH          }->{'vqa': {'val-anno': './data/vqa/raw/v2_mscoco_val2014_annotations.json', 'val': './data/vqa/raw/v2_OpenEnded_mscoco_val2014_questions.json', 'test': './data/vqa/raw/v2_OpenEnded_mscoco_test2015_questions.json', 'vg': './data/vqa/raw/VG_questions.json', 'train-anno': './data/vqa/raw/v2_mscoco_train2014_annotations.json', 'train': './data/vqa/raw/v2_OpenEnded_mscoco_train2014_questions.json', 'vg-anno': './data/vqa/raw/VG_annotations.json'}, 'clevr': {'val': './data/clevr/raw/questions/CLEVR_val_questions.json', 'train': './data/clevr/raw/questions/CLEVR_train_questions.json', 'test': './data/clevr/raw/questions/CLEVR_test_questions.json'}, 'gqa': {'val': './data/gqa/raw/questions1.2/val_balanced_questions.json', 'val_all': './data/gqa/raw/questions1.2/val_all_questions.json', 'test': './data/gqa/raw/questions1.2/submission_all_questions.json', 'testdev_all': './data/gqa/raw/questions1.2/testdev_all_questions.json', 'val_choices': './data/gqa/raw/eval/val_choices.json', 'train_choices': './data/gqa/raw/eval/train_choices', 'train': './data/gqa/raw/questions1.2/train_balanced_questions.json', 'testdev': './data/gqa/raw/questions1.2/testdev_balanced_questions.json'}}
{ RESULT_PATH       }->./results/result_test
{ RESUME            }->False
{ RUN_MODE          }->train
{ SEED              }->4120150
{ SPLIT             }->{'val': 'testdev', 'train': 'train+val', 'test': 'test'}
{ SPLITS            }->{'vqa': {'val': 'val', 'train': '', 'test': 'test'}, 'clevr': {'val': 'val', 'train': '', 'test': 'test'}, 'gqa': {'val': 'testdev', 'train': 'train+val', 'test': 'test'}}
{ SUB_BATCH_SIZE    }->16
{ TASK_LOSS_CHECK   }->{'vqa': ['bce', 'kld'], 'clevr': ['ce'], 'gqa': ['ce']}
{ TEST_SAVE_PRED    }->False
{ TRAIN_SPLIT       }->train+val
{ USE_AUX_FEAT      }->True
{ USE_BBOX_FEAT     }->True
{ USE_GLOVE         }->True
{ VERBOSE           }->True
{ VERSION           }->default_grid+bb+frcn
{ WARMUP_EPOCH      }->2
{ WORD_EMBED_SIZE   }->300
=====================================
nowTime: 2020-01-15 17:17:00
Epoch: 1, Loss: 1.68896448995679, Lr: 3.3333333333333335e-05
Elapsed time: 12386, Speed(s/batch): 0.7374428591009378

Binary: 56.75%
Open: 34.61%
Accuracy: 44.77%
Distribution: 4.05 (lower is better)
Accuracy / structural type:
  choose: 60.76% (1129 questions)
  compare: 57.05% (589 questions)
  logical: 57.35% (1803 questions)
  query: 34.61% (6805 questions)
  verify: 54.17% (2252 questions)
Accuracy / semantic type:
  attr: 48.98% (5186 questions)
  cat: 40.38% (1149 questions)
  global: 49.68% (157 questions)
  obj: 68.12% (778 questions)
  rel: 38.04% (5308 questions)
Accuracy / steps number:
  1: 60.34% (237 questions)
  2: 40.72% (6395 questions)
  3: 45.99% (4266 questions)
  4: 47.04% (793 questions)
  5: 60.46% (822 questions)
  6: 68.29% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.18% (151 questions)
  4: 41.75% (630 questions)
  5: 35.04% (1290 questions)
  6: 40.98% (2074 questions)
  7: 42.02% (1642 questions)
  8: 46.33% (1185 questions)
  9: 49.65% (1281 questions)
  10: 50.92% (1249 questions)
  11: 46.98% (994 questions)
  12: 52.04% (638 questions)
  13: 49.78% (462 questions)
  14: 47.25% (345 questions)
  15: 51.90% (237 questions)
  16: 51.28% (117 questions)
  17: 45.74% (94 questions)
  18: 53.95% (76 questions)
  19: 58.14% (43 questions)
  20: 56.25% (32 questions)
  21: 42.11% (19 questions)
  22: 50.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-15 20:44:31
Epoch: 2, Loss: 1.3852027663334634, Lr: 5e-05
Elapsed time: 12375, Speed(s/batch): 0.7367934128107642

Binary: 63.78%
Open: 37.11%
Accuracy: 49.35%
Distribution: 2.37 (lower is better)
Accuracy / structural type:
  choose: 66.96% (1129 questions)
  compare: 55.52% (589 questions)
  logical: 58.62% (1803 questions)
  query: 37.11% (6805 questions)
  verify: 68.47% (2252 questions)
Accuracy / semantic type:
  attr: 53.49% (5186 questions)
  cat: 41.17% (1149 questions)
  global: 54.78% (157 questions)
  obj: 73.01% (778 questions)
  rel: 43.44% (5308 questions)
Accuracy / steps number:
  1: 62.87% (237 questions)
  2: 45.08% (6395 questions)
  3: 52.98% (4266 questions)
  4: 47.16% (793 questions)
  5: 59.25% (822 questions)
  6: 73.17% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 44.92% (630 questions)
  5: 36.90% (1290 questions)
  6: 44.79% (2074 questions)
  7: 47.56% (1642 questions)
  8: 49.96% (1185 questions)
  9: 54.72% (1281 questions)
  10: 56.37% (1249 questions)
  11: 53.52% (994 questions)
  12: 56.27% (638 questions)
  13: 53.03% (462 questions)
  14: 55.36% (345 questions)
  15: 62.87% (237 questions)
  16: 56.41% (117 questions)
  17: 50.00% (94 questions)
  18: 59.21% (76 questions)
  19: 67.44% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 41.67% (12 questions)
  23: 0.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 00:11:51
Epoch: 3, Loss: 1.2009483448863143, Lr: 5e-05
Elapsed time: 12355, Speed(s/batch): 0.7356053381398849

Binary: 68.98%
Open: 36.49%
Accuracy: 51.40%
Distribution: 2.03 (lower is better)
Accuracy / structural type:
  choose: 69.71% (1129 questions)
  compare: 53.82% (589 questions)
  logical: 66.67% (1803 questions)
  query: 36.49% (6805 questions)
  verify: 74.42% (2252 questions)
Accuracy / semantic type:
  attr: 56.75% (5186 questions)
  cat: 42.30% (1149 questions)
  global: 52.23% (157 questions)
  obj: 79.05% (778 questions)
  rel: 44.07% (5308 questions)
Accuracy / steps number:
  1: 65.40% (237 questions)
  2: 46.21% (6395 questions)
  3: 54.27% (4266 questions)
  4: 56.87% (793 questions)
  5: 64.84% (822 questions)
  6: 78.05% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.79% (151 questions)
  4: 46.67% (630 questions)
  5: 38.60% (1290 questions)
  6: 47.97% (2074 questions)
  7: 49.57% (1642 questions)
  8: 54.85% (1185 questions)
  9: 56.75% (1281 questions)
  10: 56.53% (1249 questions)
  11: 53.62% (994 questions)
  12: 55.96% (638 questions)
  13: 55.84% (462 questions)
  14: 57.39% (345 questions)
  15: 64.98% (237 questions)
  16: 60.68% (117 questions)
  17: 51.06% (94 questions)
  18: 56.58% (76 questions)
  19: 76.74% (43 questions)
  20: 50.00% (32 questions)
  21: 52.63% (19 questions)
  22: 58.33% (12 questions)
  23: 50.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 03:38:51
Epoch: 4, Loss: 1.0470566586765542, Lr: 5e-05
Elapsed time: 12343, Speed(s/batch): 0.7348805314644543

Binary: 70.38%
Open: 38.28%
Accuracy: 53.01%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 72.63% (1129 questions)
  compare: 59.25% (589 questions)
  logical: 67.00% (1803 questions)
  query: 38.28% (6805 questions)
  verify: 74.87% (2252 questions)
Accuracy / semantic type:
  attr: 58.54% (5186 questions)
  cat: 43.52% (1149 questions)
  global: 53.50% (157 questions)
  obj: 80.33% (778 questions)
  rel: 45.65% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 48.27% (6395 questions)
  3: 55.49% (4266 questions)
  4: 58.26% (793 questions)
  5: 64.84% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 28.48% (151 questions)
  4: 48.89% (630 questions)
  5: 39.69% (1290 questions)
  6: 49.28% (2074 questions)
  7: 52.50% (1642 questions)
  8: 55.27% (1185 questions)
  9: 57.30% (1281 questions)
  10: 59.81% (1249 questions)
  11: 54.73% (994 questions)
  12: 58.31% (638 questions)
  13: 57.36% (462 questions)
  14: 61.16% (345 questions)
  15: 64.14% (237 questions)
  16: 59.83% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 07:05:39
Epoch: 5, Loss: 0.9381465803343356, Lr: 5e-05
Elapsed time: 12374, Speed(s/batch): 0.7367752227478863

Binary: 71.40%
Open: 37.77%
Accuracy: 53.20%
Distribution: 2.29 (lower is better)
Accuracy / structural type:
  choose: 70.50% (1129 questions)
  compare: 61.29% (589 questions)
  logical: 68.66% (1803 questions)
  query: 37.77% (6805 questions)
  verify: 76.69% (2252 questions)
Accuracy / semantic type:
  attr: 60.05% (5186 questions)
  cat: 40.91% (1149 questions)
  global: 52.87% (157 questions)
  obj: 81.88% (778 questions)
  rel: 44.99% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.18% (6395 questions)
  3: 56.89% (4266 questions)
  4: 60.40% (793 questions)
  5: 66.79% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 27.81% (151 questions)
  4: 46.98% (630 questions)
  5: 40.54% (1290 questions)
  6: 49.08% (2074 questions)
  7: 51.83% (1642 questions)
  8: 55.36% (1185 questions)
  9: 58.86% (1281 questions)
  10: 60.45% (1249 questions)
  11: 56.74% (994 questions)
  12: 58.62% (638 questions)
  13: 58.01% (462 questions)
  14: 59.42% (345 questions)
  15: 61.18% (237 questions)
  16: 64.96% (117 questions)
  17: 47.87% (94 questions)
  18: 63.16% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 52.63% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 10:32:59
Epoch: 6, Loss: 0.8608418081554507, Lr: 5e-05
Elapsed time: 12379, Speed(s/batch): 0.7370702962161075

Binary: 71.97%
Open: 38.62%
Accuracy: 53.93%
Distribution: 2.63 (lower is better)
Accuracy / structural type:
  choose: 72.10% (1129 questions)
  compare: 54.33% (589 questions)
  logical: 71.60% (1803 questions)
  query: 38.62% (6805 questions)
  verify: 76.82% (2252 questions)
Accuracy / semantic type:
  attr: 61.40% (5186 questions)
  cat: 41.69% (1149 questions)
  global: 54.78% (157 questions)
  obj: 81.23% (778 questions)
  rel: 45.25% (5308 questions)
Accuracy / steps number:
  1: 67.93% (237 questions)
  2: 47.79% (6395 questions)
  3: 57.06% (4266 questions)
  4: 62.80% (793 questions)
  5: 69.95% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 29.14% (151 questions)
  4: 49.21% (630 questions)
  5: 40.16% (1290 questions)
  6: 49.57% (2074 questions)
  7: 53.71% (1642 questions)
  8: 56.88% (1185 questions)
  9: 60.66% (1281 questions)
  10: 59.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 59.87% (638 questions)
  13: 58.66% (462 questions)
  14: 59.71% (345 questions)
  15: 64.56% (237 questions)
  16: 64.10% (117 questions)
  17: 51.06% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 43.75% (32 questions)
  21: 68.42% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 14:00:23
Epoch: 7, Loss: 0.7950429820463256, Lr: 5e-05
Elapsed time: 12384, Speed(s/batch): 0.7373725269828192

Binary: 72.20%
Open: 37.87%
Accuracy: 53.63%
Distribution: 2.02 (lower is better)
Accuracy / structural type:
  choose: 73.25% (1129 questions)
  compare: 61.63% (589 questions)
  logical: 70.55% (1803 questions)
  query: 37.87% (6805 questions)
  verify: 75.75% (2252 questions)
Accuracy / semantic type:
  attr: 61.11% (5186 questions)
  cat: 41.95% (1149 questions)
  global: 55.41% (157 questions)
  obj: 82.65% (778 questions)
  rel: 44.54% (5308 questions)
Accuracy / steps number:
  1: 67.51% (237 questions)
  2: 47.35% (6395 questions)
  3: 57.22% (4266 questions)
  4: 60.91% (793 questions)
  5: 69.95% (822 questions)
  6: 82.93% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 26.49% (151 questions)
  4: 47.78% (630 questions)
  5: 39.46% (1290 questions)
  6: 50.39% (2074 questions)
  7: 52.68% (1642 questions)
  8: 55.02% (1185 questions)
  9: 59.48% (1281 questions)
  10: 61.65% (1249 questions)
  11: 54.93% (994 questions)
  12: 58.62% (638 questions)
  13: 58.23% (462 questions)
  14: 60.29% (345 questions)
  15: 65.40% (237 questions)
  16: 61.54% (117 questions)
  17: 61.70% (94 questions)
  18: 64.47% (76 questions)
  19: 72.09% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 17:27:54
Epoch: 8, Loss: 0.7412050358980075, Lr: 5e-05
Elapsed time: 12385, Speed(s/batch): 0.7374002362296933

Binary: 72.20%
Open: 37.63%
Accuracy: 53.50%
Distribution: 2.05 (lower is better)
Accuracy / structural type:
  choose: 71.30% (1129 questions)
  compare: 60.95% (589 questions)
  logical: 71.60% (1803 questions)
  query: 37.63% (6805 questions)
  verify: 76.07% (2252 questions)
Accuracy / semantic type:
  attr: 61.47% (5186 questions)
  cat: 41.43% (1149 questions)
  global: 53.50% (157 questions)
  obj: 83.16% (778 questions)
  rel: 43.97% (5308 questions)
Accuracy / steps number:
  1: 68.35% (237 questions)
  2: 47.33% (6395 questions)
  3: 56.54% (4266 questions)
  4: 62.17% (793 questions)
  5: 70.32% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 23.84% (151 questions)
  4: 46.83% (630 questions)
  5: 39.30% (1290 questions)
  6: 51.30% (2074 questions)
  7: 53.05% (1642 questions)
  8: 56.20% (1185 questions)
  9: 59.88% (1281 questions)
  10: 60.61% (1249 questions)
  11: 52.92% (994 questions)
  12: 60.19% (638 questions)
  13: 56.49% (462 questions)
  14: 60.00% (345 questions)
  15: 61.60% (237 questions)
  16: 63.25% (117 questions)
  17: 54.26% (94 questions)
  18: 61.84% (76 questions)
  19: 74.42% (43 questions)
  20: 46.88% (32 questions)
  21: 57.89% (19 questions)
  22: 66.67% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-16 20:55:23
Epoch: 9, Loss: 0.47916676114195417, Lr: 1e-05
Elapsed time: 12368, Speed(s/batch): 0.7364252933464496

Binary: 75.11%
Open: 39.99%
Accuracy: 56.11%
Distribution: 1.91 (lower is better)
Accuracy / structural type:
  choose: 74.93% (1129 questions)
  compare: 61.97% (589 questions)
  logical: 74.82% (1803 questions)
  query: 39.99% (6805 questions)
  verify: 78.86% (2252 questions)
Accuracy / semantic type:
  attr: 63.67% (5186 questions)
  cat: 42.82% (1149 questions)
  global: 54.14% (157 questions)
  obj: 86.25% (778 questions)
  rel: 47.23% (5308 questions)
Accuracy / steps number:
  1: 70.04% (237 questions)
  2: 49.54% (6395 questions)
  3: 59.70% (4266 questions)
  4: 65.07% (793 questions)
  5: 73.36% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 50.95% (630 questions)
  5: 42.25% (1290 questions)
  6: 53.81% (2074 questions)
  7: 55.60% (1642 questions)
  8: 57.81% (1185 questions)
  9: 63.23% (1281 questions)
  10: 60.85% (1249 questions)
  11: 56.64% (994 questions)
  12: 61.60% (638 questions)
  13: 58.87% (462 questions)
  14: 64.64% (345 questions)
  15: 62.87% (237 questions)
  16: 64.96% (117 questions)
  17: 62.77% (94 questions)
  18: 64.47% (76 questions)
  19: 81.40% (43 questions)
  20: 43.75% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 00:22:37
Epoch: 10, Loss: 0.37404145107083603, Lr: 1e-05
Elapsed time: 12371, Speed(s/batch): 0.7365669963274095

Binary: 75.06%
Open: 39.59%
Accuracy: 55.87%
Distribution: 1.95 (lower is better)
Accuracy / structural type:
  choose: 76.79% (1129 questions)
  compare: 62.65% (589 questions)
  logical: 73.93% (1803 questions)
  query: 39.59% (6805 questions)
  verify: 78.33% (2252 questions)
Accuracy / semantic type:
  attr: 64.13% (5186 questions)
  cat: 43.08% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.83% (778 questions)
  rel: 46.38% (5308 questions)
Accuracy / steps number:
  1: 67.09% (237 questions)
  2: 49.62% (6395 questions)
  3: 59.28% (4266 questions)
  4: 65.07% (793 questions)
  5: 71.90% (822 questions)
  6: 85.37% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 32.45% (151 questions)
  4: 49.84% (630 questions)
  5: 42.40% (1290 questions)
  6: 52.75% (2074 questions)
  7: 55.97% (1642 questions)
  8: 57.47% (1185 questions)
  9: 60.81% (1281 questions)
  10: 62.77% (1249 questions)
  11: 56.54% (994 questions)
  12: 60.50% (638 questions)
  13: 59.52% (462 questions)
  14: 65.80% (345 questions)
  15: 62.87% (237 questions)
  16: 64.10% (117 questions)
  17: 64.89% (94 questions)
  18: 65.79% (76 questions)
  19: 79.07% (43 questions)
  20: 50.00% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

=====================================
nowTime: 2020-01-17 03:49:53
Epoch: 11, Loss: 0.27490622613804777, Lr: 2.0000000000000003e-06
Elapsed time: 12362, Speed(s/batch): 0.7360429190743336

Binary: 75.13%
Open: 40.21%
Accuracy: 56.23%
Distribution: 1.93 (lower is better)
Accuracy / structural type:
  choose: 77.06% (1129 questions)
  compare: 62.48% (589 questions)
  logical: 73.60% (1803 questions)
  query: 40.21% (6805 questions)
  verify: 78.69% (2252 questions)
Accuracy / semantic type:
  attr: 64.38% (5186 questions)
  cat: 43.34% (1149 questions)
  global: 53.50% (157 questions)
  obj: 84.58% (778 questions)
  rel: 46.99% (5308 questions)
Accuracy / steps number:
  1: 70.46% (237 questions)
  2: 49.84% (6395 questions)
  3: 59.94% (4266 questions)
  4: 66.20% (793 questions)
  5: 70.56% (822 questions)
  6: 80.49% (41 questions)
  7: 100.00% (20 questions)
  8: 100.00% (3 questions)
  9: 100.00% (1 questions)
Accuracy / words number:
  3: 31.13% (151 questions)
  4: 50.32% (630 questions)
  5: 43.33% (1290 questions)
  6: 52.56% (2074 questions)
  7: 56.70% (1642 questions)
  8: 57.81% (1185 questions)
  9: 62.61% (1281 questions)
  10: 62.21% (1249 questions)
  11: 56.64% (994 questions)
  12: 60.97% (638 questions)
  13: 57.58% (462 questions)
  14: 67.25% (345 questions)
  15: 64.98% (237 questions)
  16: 66.67% (117 questions)
  17: 59.57% (94 questions)
  18: 68.42% (76 questions)
  19: 76.74% (43 questions)
  20: 53.12% (32 questions)
  21: 63.16% (19 questions)
  22: 75.00% (12 questions)
  23: 25.00% (4 questions)
  24: 100.00% (2 questions)
  25: 100.00% (1 questions)

I am trying to run the same dataset but I facing a key error issue. Could you please help me?