Wired results: obtain the similar reported result without training...

xxxzhi commented 4 years ago

Thanks for your released code. However, I faces a strage problem that the HAKE-Action seems do not need to train. I run the code using the command:

python tools/Train_pasta_HICO_DET.py --data 0 --init_weight 1 --train_module 2 --num_iteration 11 --model test_pastanet

Then, I test the model 'test_pastanet/HOI_iter_10.ckpt'.

Unbelievably! The result is:

Default:  0.2194696378436301                                                                                                                                                                                       
Default rare:  0.20412592656443512                                                                                                                                                                                 
Default non-rare:  0.22405282432962342                                                                                                           
Known object:  0.2382635240705128                                                                                                               
Known object, rare:  0.22197228395195176                                                                                                                                                                           
Known object, non-rare:  0.24312973865138168

Have you ever tested your code like this? This result is ....... I'm trying to test snapshot 1. I do not change the code.

Foruck commented 4 years ago

Our reported 22.12 mAP for HAKE-HICO-DET (PaStaNet in paper) is composed of multiple components: The result of the Instance-level PaStaNet is fused with the result from TIN and our image-level PaStaNet, then applied with Non-Interactive Suppression (NIS) trained on HAKE. The vanilla TIN with Non-Interactive Suppression (NIS) trained on HAKE would achieve 18.33 mAP, while if enhanced with our image-level PaStaNet it will reach 21.60 mAP. If we test only the result of model 'res50_faster_rcnn_iter_1190000.ckpt' with other parameters randomly initialized (we use it as an approximation of 'test_pastanet/HOI_iter_10.ckpt' in this issue), we will get an mAP of 2.91 with Non-Interactive Suppression (NIS) trained on HAKE, which is trivial, while our Instance-level PaStaNet will achieve 19.52 mAP under the same setting. Thus we can see the reported result in this issue is mainly credited to TIN and our image-level PaStaNet. Also note that the combination of different high-performance models and techniques usually won't result in linear improvement. And the higher performance each individual model achieves, the harder it is to improve the performance with combination, as their overlaps could be larger. However, with more data and knowledge, our HAKE-Large (PaStaNet in paper) manages to improve the performance considerably, showing the importance of our HAKE-Large data.

xxxzhi commented 4 years ago

Well,

!!! The result of 'test_pastanet/HOI_iter_1.ckpt' is:

  Default:  0.22061835057332763
  Default rare:  0.20459114486066238
  Default non-rare:  0.2254056977342536
  Known object:  0.24004048942292627
  Known object, rare:  0.2247353598730576
  Known object, non-rare:  0.24461215149626364

I think this is an approximation of randomly initialized model in my experiment. Is it because I possibly make some mistakes? Or Somethings (e.g. parts knowledge) that you provided have been optimized?

which is trivial, while our Instance-level PaStaNet* will achieve 19.52 mAP under the same setting.

Here, do you mean Instance-level PaStaNet* achieve 19.52 under radomly initialized model?

Btw, compared to your result, TIN might be useless because we directly achieve around 18.2 without Non-Interactive Suppression based on TIN code. One of very useful parts is re-weighting and it is also very simple.

xxxzhi commented 4 years ago

I run the code with python3. So I change the pickle format like this,

Trainval_GT = pickle.load(open(cfg.DATA_DIR + '/' + 'Trainval_GT_10w.pkl', "rb"), encoding='latin1') Trainval_N = pickle.load(open(cfg.DATA_DIR + '/' + 'Trainval_Neg_10w.pkl', "rb"), encoding='latin1')

Other part is unchanged. I can also upload the code to github. You might find out the problem if your randomly initialized model is worse much than 22.00.

Thanks.

Foruck commented 4 years ago

Well,

!!! The result of 'test_pastanet/HOI_iter_1.ckpt' is:
  Default:  0.22061835057332763
  Default rare:  0.20459114486066238
  Default non-rare:  0.2254056977342536
  Known object:  0.24004048942292627
  Known object, rare:  0.2247353598730576
  Known object, non-rare:  0.24461215149626364
I think this is an approximation of randomly initialized model in my experiment. Is it because I possibly make some mistakes? Or Somethings (e.g. parts knowledge) that you provided have been optimized?

which is trivial, while our Instance-level PaStaNet* will achieve 19.52 mAP under the same setting.

Here, do you mean Instance-level PaStaNet* achieve 19.52 under radomly initialized model?

Btw, compared to your result, TIN might be useless because we directly achieve around 18.2 without Non-Interactive Suppression based on TIN code. One of very useful parts is re-weighting and it is also very simple.

The reported 22.06 mAP in the comment contains the effect of 'test_pastanet/HOI_iter_1.ckpt', TIN, NIS trained on HAKE, our optimized Image-level PaStaNet* result and our re-weighting strategy. If 'test_pastanet/HOI_iter_1.ckpt' is removed, the result will be approximately 21.6 mAP. If 'test_pastanet/HOI_iter_1.ckpt' is replaced with our trained PaStaNet model, it will achieve 22.66 mAP. Also note that the improvement that the fusion between models brings are usually not linear, and our Instance-level model suffers from the overlap with our Image-level model a lot, especially for fusion.
The 19.52 mAP is achieved by our trained PaStaNet model without fusion with TIN, NIS and Image-level PaStaNet. And the 2.91 mAP is achieved by 'res50_faster_rcnn_iter_1190000.ckpt' (all parameters that aren't in the checkpoint are randomly initialized) without fusion with TIN, NIS and Image-level PaStaNet*. Sorry for the ambiguity.
Noticing that current HOI detection model might produce irrational classification scores for rare categories, we perform grid search on selected validation set from HAKE to find our reweighting factors. They did help a lot, however, even with them, the vanilla 'res50_faster_rcnn_iter_1190000.ckpt' (without fusion with TIN, NIS and Image-level PaStaNet*) still performs poorly as shown in (2). Therefore, the well-trained TIN model is still important.

xxxzhi commented 4 years ago

Thanks for your reply. This information is important and helpful!

we perform grid search on selected validation set from HAKE to find our reweighting factors.

oh, thx. I have noticed the HO_weight is different from TIN.

The 19.52 mAP is achieved by our trained PaStaNet model without fusion with TIN, NIS and Image-level PaStaNet

So, you still include Image-level PaStaNet* in the final result? Would you mind provide the full result? I mean the Rare category and UnRare category. The full is 19.52, the rare is ?

I'm still confused about the performance of test_pastanet/HOI_iter_1.ckpt. In other word. The training step (python tools/Train_pasta_HICO_DET.py --data 0 --init_weight 1 --train_module 2 --num_iteration 11 --model test_pastanet) is unnecesary when we fuse the model with TIN, NIS, Image-level PaStaNet? That seems like ok cause you have used Image-level PaStaNet. What's the result of the model without Image-level PaStaNet*? This would be also helpful. Your final result looks like a ensemble of multiple models.

Anyway. Thanks for your information very much.

Foruck commented 4 years ago

It achieves 17.29 mAP on the Rare set and 20.19 mAP on the Un-rare set. And we are working on the journal version of PaStaNet, in which we improve the result of the model without Image-level PaStaNet. It will be made public soon.

DirtyHarryLYL / HAKE-Action

Wired results: obtain the similar reported result without training... #30