training MUTAN+Att using pytorch code achieve low accuracy

gaopeng-eugene commented 7 years ago

Hi, thank you so much for your code. Right now, I am trying to replicate your ICCV results with the pytorch implementation. Here is the setting 'batch_size': None, 'dir_logs': None, 'epochs': None, 'evaluate': False, 'help_opt': False, 'learning_rate': None, 'path_opt': 'options/vqa/mutan_att_trainval.yaml', 'print_freq': 10, 'resume': '', 'save_all_from': None, 'save_model': True, 'st_dropout': None, 'st_fixed_emb': None, 'st_type': None, 'start_epoch': 0, 'vqa_trainsplit': 'train', 'workers': 16}

options

{'coco': {'arch': 'fbresnet152torch', 'dir': 'data/coco', 'mode': 'att'}, 'logs': {'dir_logs': 'logs/vqa/mutan_att_trainval'}, 'model': {'arch': 'MutanAtt', 'attention': {'R': 5, 'activation_q': 'tanh', 'activation_v': 'tanh', 'dim_hq': 310, 'dim_hv': 310, 'dim_mm': 510, 'dropout_hv': 0, 'dropout_mm': 0.5, 'dropout_q': 0.5, 'dropout_v': 0.5, 'nb_glimpses': 2}, 'classif': {'dropout': 0.5}, 'dim_q': 2400, 'dim_v': 2048, 'fusion': {'R': 5, 'activation_q': 'tanh', 'activation_v': 'tanh', 'dim_hq': 310, 'dim_hv': 620, 'dim_mm': 510, 'dropout_hq': 0, 'dropout_hv': 0, 'dropout_q': 0.5, 'dropout_v': 0.5}, 'seq2vec': {'arch': 'skipthoughts', 'dir_st': 'data/skip-thoughts', 'dropout': 0.25, 'fixed_emb': False, 'type': 'BayesianUniSkip'}}, 'optim': {'batch_size': 128, 'epochs': 100, 'lr': 0.0001}, 'vqa': {'dataset': 'VQA', 'dir': 'data/vqa', 'maxlength': 26, 'minwcount': 0, 'nans': 2000, 'nlp': 'mcb', 'pad': 'right', 'samplingans': True, 'trainsplit': 'train'}} Warning: 399/930911 words are not in dictionary, thus set UNK Warning fusion.py: no visual embedding before fusion Warning fusion.py: no question embedding before fusion Warning fusion.py: no visual embedding before fusion Warning fusion.py: no question embedding before fusion Model has 37840812 parameters

Here is the result after 100 epoch Epoch: [99][1740/1760] Time 0.403 (0.412) Data 0.000 (0.007) Loss 0.8993 (0.9064) Acc@1 71.094 (73.912) Acc@5 94.531 (94.830) Epoch: [99][1750/1760] Time 0.387 (0.412) Data 0.000 (0.007) Loss 0.8277 (0.9061) Acc@1 71.875 (73.915) Acc@5 95.312 (94.833) Val: [900/950] Time 0.138 (0.188) Loss 3.1201 (2.8397) Acc@1 49.219 (52.236) Acc@5 75.000 (78.115) Val: [910/950] Time 0.189 (0.187) Loss 2.4805 (2.8372) Acc@1 58.594 (52.240) Acc@5 80.469 (78.139) Val: [920/950] Time 0.210 (0.187) Loss 2.8639 (2.8388) Acc@1 53.125 (52.226) Acc@5 77.344 (78.137) Val: [930/950] Time 0.179 (0.187) Loss 2.1427 (2.8388) Acc@1 59.375 (52.227) Acc@5 82.031 (78.137) Val: [940/950] Time 0.151 (0.187) Loss 3.1772 (2.8367) Acc@1 50.781 (52.263) Acc@5 72.656 (78.163)

Acc@1 52.266 Acc@5 52.266

gaopeng-eugene commented 7 years ago

Here is the command line I use.

gaopeng-eugene commented 7 years ago

python train.py --vqa_trainsplit train --path_opt options/vqa/mutan_att_train.yaml

gaopeng-eugene commented 7 years ago

To summarize the result, I am training on train set and evaluating on val set. MUTAN+Att is 53 MUTAN+No Att is 50

Cadene commented 7 years ago

You're looking at the val accuracy not the open ended val accuracy. The latter can be obtained using eval_res.py. This file is automatically executed after each training epoch : https://github.com/Cadene/vqa.pytorch/blob/master/train.py#L287

eval_res.py generates the open ended accuracy in a json file in the exp directory (logs). The open ended accuracy can be viewed using plotly : https://github.com/Cadene/vqa.pytorch#monitor-training

gaopeng-eugene commented 7 years ago

Thank you so much for your quick reply. I will try your suggestion. Another question when I read your ICCV paper : In you paper, you compare with other method in No Attention and ensemble setting. Why not compare with Single Model Attention Setting?

gaopeng-eugene commented 7 years ago

A small question, what is the difference between val accuracy and open ended val accuracy? As far as I know, there is two measurement in VQA, open ended accuracy and MC?

Cadene commented 7 years ago

Why not compare with Single Model Attention Setting?

It would have been a good idea, but we were really running out of time and place in the paper. So we focused on what we thought were the most important.

A small question, what is the difference between val accuracy and open ended val accuracy?

Look at the equation (13) in the paper: "If the predicted answer appears at least 3 times in the ground truth answers, the accuracy for this example is considered to be 1. Intuitively, this metrics takes into account the consensus between annotators."

As far as I know, there is two measurement in VQA, open ended accuracy and MC?

VQA OpenEnded and VQA MC are two different problems. MC stands for Multiple Choices (answers which are inputs).

gaopeng-eugene commented 7 years ago

Thank you so much for your reply.

Cadene / vqa.pytorch

training MUTAN+Att using pytorch code achieve low accuracy #8

options