DirtyHarryLYL / Transferable-Interactiveness-Network

Code for Transferable Interactiveness Knowledge for Human-Object Interaction Detection. (CVPR'19, TPAMI'21)
MIT License
227 stars 41 forks source link

Questions about the ablation studies #60

Closed wanna-fly closed 4 years ago

wanna-fly commented 4 years ago

Hi guys, thanks for your nice code! I'm trying to check the contribution of each stream, but the result is totally different from that in your paper. Here is my method:

  1. For human stream, I just store prediction_Hgenerated by net.test_image_Hand repeat it for all objects paired with the current human instance during the test;
  2. For object stream, I use self.predictions["cls_prob_O"] as prediction;
  3. Similarly, I use self.predictions["cls_prob_sp"] as prediction for sp stream;

I train the network jointly and adopt the above settings during the test. And finally I got a result like this: AP = 37.85 for human stream, AP = 31.63 map for object stream and AP = 47.19 for sp stream. I think there must be something wrong with my method, but I have no idea about it. So would you mind share your strategy of the ablation study? How do you guys get the results of different streams?

Foruck commented 4 years ago

Sorry for the late reply! Please refer to the statement in Sec. 5.4, Three Streams. The result is achieved by keeping one stream in P each time. Therefore, to reproduce our reported result, you should keep the three items in self.predictions (they belong to C, thus shouldn't be removed for ablation study), while modify self.binary_discriminator to keep fc7_H and fc7_SH for human only, keep fc7_O and fc7_SO for object only, and keep fc_binary_1 for spatial only, then retrain the whole model, finally perform inference with the correspondingly retrained model.

wanna-fly commented 4 years ago

I got it. The ablation study is conducted on P. Thanks for your answer. It really helps.