OreoChocolate / MUREN

The official code for Relational Context Learning for Human-Object Interaction Detection, CVPR2023.
http://cvlab.postech.ac.kr/research/MUREN/
48 stars 5 forks source link

Why Can't I Reproduce This Results? #5

Closed alexw994 closed 10 months ago

alexw994 commented 10 months ago

On V-coco, I trained using the commands in the repository and only achieved a result of 64.1. When I tested with the 'eval' command, the MAP was only 65.9.

---------Reporting Role AP (%)------------------ hold-obj: AP = 58.44 (#pos = 3608) sit-instr: AP = 59.50 (#pos = 1916) ride-instr: AP = 73.68 (#pos = 556) look-obj: AP = 48.34 (#pos = 3347) hit-instr: AP = 80.03 (#pos = 349) hit-obj: AP = 69.51 (#pos = 349) eat-obj: AP = 71.48 (#pos = 521) eat-instr: AP = 76.79 (#pos = 521) jump-instr: AP = 77.71 (#pos = 635) lay-instr: AP = 58.62 (#pos = 387) talk_on_phone-instr: AP = 56.47 (#pos = 285) carry-obj: AP = 48.91 (#pos = 472) throw-obj: AP = 57.31 (#pos = 244) catch-obj: AP = 57.66 (#pos = 246) cut-instr: AP = 50.66 (#pos = 269) cut-obj: AP = 65.60 (#pos = 269) work_on_computer-instr: AP = 77.11 (#pos = 410) ski-instr: AP = 56.09 (#pos = 424) surf-instr: AP = 80.34 (#pos = 486) skateboard-instr: AP = 88.40 (#pos = 417) drink-instr: AP = 59.21 (#pos = 82) kick-obj: AP = 79.46 (#pos = 180) point-instr: AP = 8.20 (#pos = 31) read-obj: AP = 51.02 (#pos = 111) snowboard-instr: AP = 80.16 (#pos = 277) Average Role [scenario_1] AP = 63.63 Average Role [scenario_1] AP = 65.94, omitting the action "point"

---------Reporting Role AP (%)------------------ hold-obj: AP = 61.83 (#pos = 3608) sit-instr: AP = 62.22 (#pos = 1916) ride-instr: AP = 74.57 (#pos = 556) look-obj: AP = 53.29 (#pos = 3347) hit-instr: AP = 81.17 (#pos = 349) hit-obj: AP = 71.86 (#pos = 349) eat-obj: AP = 75.43 (#pos = 521) eat-instr: AP = 77.01 (#pos = 521) jump-instr: AP = 78.17 (#pos = 635) lay-instr: AP = 61.32 (#pos = 387) talk_on_phone-instr: AP = 58.56 (#pos = 285) carry-obj: AP = 50.48 (#pos = 472) throw-obj: AP = 59.77 (#pos = 244) catch-obj: AP = 62.53 (#pos = 246) cut-instr: AP = 51.62 (#pos = 269) cut-obj: AP = 67.81 (#pos = 269) work_on_computer-instr: AP = 78.73 (#pos = 410) ski-instr: AP = 61.23 (#pos = 424) surf-instr: AP = 80.91 (#pos = 486) skateboard-instr: AP = 88.89 (#pos = 417) drink-instr: AP = 59.94 (#pos = 82) kick-obj: AP = 83.20 (#pos = 180) point-instr: AP = 8.24 (#pos = 31) read-obj: AP = 56.72 (#pos = 111) snowboard-instr: AP = 81.60 (#pos = 277) Average Role [scenario_2] AP = 65.88 Average Role [scenario_2] AP = 68.29, omitting the action "point"

Is my understanding of the metrics incorrect? Thank you very much for the reply.

OreoChocolate commented 10 months ago

Hi, thank you for your interest in our research. We have experienced that It is unstable to train the transformer for one-stage HOI detection. I recommend that you train the model several times by changing the hyper-parameters (e.g., lr, seed)

alexw994 commented 10 months ago

Thank you for the reply, so my steps are all correct, for example, here:

Average Role [scenario_1] AP = 65.94, omitting the action "point"

and

Average Role [scenario_2] AP = 68.29, omitting the action "point"

correctly correspond to 68.8 and 71.0 as mentioned in the paper, right?

image

OreoChocolate commented 10 months ago

Yes, we report results of omitting the action "point", following previous works [STIP,HOTR].