Weird evaluation results

JingweiJ commented 4 years ago

Hi Yue,

Thanks for you nice work! I tried to run the evaluation codes on your shared model (model_last.pth of dla34 in https://drive.google.com/drive/folders/1K0H05nSUOCq939tmvBRJjskdPSLy1U-U, I renamed it model_last.dla34.pth), however i failed to reproduce the numbers. Below are the command i run and stdout/stderr i got (file path masked). Do you have some thoughts on what might be wrong? There seems to be some parameters missing in the .pth file, is it desired?

$ python test_hoi.py hoidet --exp_id hoidet_hico_dla --load_model ../models/model_last.dla34.pth  --gpus 0 --dataset hico --image_dir images/test2015 --test_with_eval
Fix size testing.
training chunk_sizes: [32]
The output will be saved to  [MYFOLDER]/PPDM/src/lib/../../exp/hoidet/hoidet_hico_dla
heads {'sub_offset': 2, 'wh': 2, 'hm_rel': 117, 'hm': 80, 'obj_offset': 2, 'hm_human': 1, 'reg': 2}
Namespace(K=100, arch='dla_34', batch_size=32, cat_spec_wh=False, chunk_sizes=[32], data_dir='[MYFOLDER]/PPDM/src/lib/../../data', dataset='hico', debug=0, debug_dir='[MYFOLDER]/PPDM/src/lib/../../exp/hoidet/hoidet_hico_dla/de
bug', debugger_theme='white', demo='', dense_wh=False, down_ratio=4, exp_dir='[MYFOLDER]/PPDM/src/lib/../../exp/hoidet', exp_id='hoidet_hico_dla', fix_res=True, flip=0.5, flip_test=False, gpus=[0], gpus_str='0', head_conv=256, heads={'sub_offset': 2, 'wh': 2, 'hm_rel': 1
17, 'hm': 80, 'obj_offset': 2, 'hm_human': 1, 'reg': 2}, hide_data_time=False, hm_weight=1, image_dir='images/test2015', input_h=512, input_res=512, input_w=512, keep_res=False, load_model='../models/model_last.dla34.pth', lr=0.000125, lr_step=[90, 120], master_batch_size=32, mean=array([[[0.40789655, 0.44719303, 0
.47026116]]], dtype=float32), metric='loss', mse_loss=False, nms=False, no_color_aug=False, norm_wh=False, not_cuda_benchmark=False, not_prefetch_test=False, not_rand_crop=False, not_reg_offset=False, num_classes=80, num_classes_verb=117, num_epochs=140, num_iters=-1, num_stacks=1, num_workers=4, off_weight=1, outp
ut_h=128, output_res=128, output_w=128, pad=31, print_iter=0, reg_loss='l1', reg_offset=True, resume=False, root_dir='[MYFOLDER]/PPDM/src/lib/../..', root_path='../Dataset', rotate=0, save_all=False, save_dir='[MYFOLDER]/PPDM/
src/lib/../../exp/hoidet/hoidet_hico_dla', save_predictions=False, save_video='', scale=0.4, seed=317, shift=0.1, std=array([[[0.2886383 , 0.27408165, 0.27809834]]], dtype=float32), task='hoidet', test=False, test_dir='', test_scales=[1.0], test_video=False, test_with_eval=True, trainval=False, use_cos=0, use_verb_
sub=0, val_intervals=100000, vis_thresh=0.3, wh_weight=0.1)
Creating model...
loaded ../models/model_last.dla34.pth, epoch 140
No param hm_human.0.weight.
No param hm_human.0.bias.
No param hm_human.2.weight.
No param hm_human.2.bias.
----epoch :0 -----
hoidet_hico_dla |################################| [9545/9546]|Tot: 0:06:08 |ETA: 0:00:01 |load 0.000s (0.000s) |pre 0.000s (0.000s) |tot 0.031s (0.032s) |merge 0.004s (0.005s) |net 0.019s (0.020s) |dec 0.005s (0.005s) |post 0.002s (0.002s) 
--------------------
mAP: 0.039696969697 mAP rare: 0.0204216073781  mAP nonrare: 0.0454545454545  max recall: 0.0216666666667
--------------------
best model id: 0, best map: 0.039696969697
Traceback (most recent call last):
  File "test_hoi.py", line 112, in <module>
    prefetch_test(opt)
  File "test_hoi.py", line 108, in prefetch_test
    save_json(best_output, model_path, 'best_predictions.json')
UnboundLocalError: local variable 'model_path' referenced before assignment

YueLiao commented 4 years ago

Thanks for your interest in our work. Firstly, make sure the newest version, because we merged the human and object branch last Sunday. Secondly, for 'No param', we are sorry that the missing update for opt.py file, but it does not cause the accuracy loss. Looking forward to your Great action genome dataset!

JingweiJ commented 4 years ago

Thanks for your reply! I double-checked that i've been using the latest codebase. I also evaluated on the model_best.pth in dla34, where i got similar mAPs, but way lower than expected.

Looking at the stored output best_predictions.json, i found that there are many noisy boxes and many of them are heavily overlapped. Is it because that there's no nms and no thresholding low-confident detections?

YueLiao commented 4 years ago

The mAPs evaluated by our provided scripts are a bit lower than (about 0.5% mAP) the official evaluation script.

Using nms can definitely improve the performance, but it will cause a slow inference-speed. It is proved by [Objects as points] (https://arxiv.org/abs/1904.07850) that it will not produce many bounding-boxes with highly overlapping. The highly overlapping bounding-boxes may be caused by the HICO-Det annotations, where the same object/human will be annotated several times.

JingweiJ commented 4 years ago

Got it. Thanks for the explanation and reference. My main concern is the discrepancy between the results i got (3.9 mAP) compared with the expected numbers. Still figuring out what is missing.

YueLiao commented 4 years ago

I have checked the code and reproduced based on the released code from scratch just now. And, I reproduce the performance and achieved the mAPs reported by us (19.94). I wonder more details.

JingweiJ commented 4 years ago

Some details of my env: python 2.7, pytorch 0.5.0, cuda 9.0, cudnn 7.3.1

JingweiJ commented 4 years ago

might be due to the pytorch version? i'll build a pytorch 0.4.1 and try again

YueLiao commented 4 years ago

I got. It main caused by the python version, where i used python 3.6, pytorch 0.4.1. I am sorry that we forget to give the right python version.

JingweiJ commented 4 years ago

ah i see. Since there are a few from __future__ import, i thought i should use python 2. I'll try and update

JingweiJ commented 4 years ago

python 3.6 works! Thanks! Closing the issue

YueLiao / PPDM

Weird evaluation results #6