performance gap i reproduced compared with yours

hello, i reproduce the work, but i encounter performance gap on object affordance on F1@0.25(89.3) and F1@0.5(80.8) compared with your reported performance, the test.log is shown as below, and the model i test is as below:

hs512_e40_bs16_lr0.001_sc-None_h2h-False_h2o-True_o2h-True_o2o-True_m-v2-v1-att-v3-False-True_sd-0.1-True_os-ind_dn-1-gs_pf-e0s0_c0_sp-0_ihs-False_ios-False_al-1.0_bl-False-1.0-1.0_sl-True-False-4.0-1.0_fl0-0.0_mt-False_pt-True_gc0.0_ds3_Subject1

and the environment is follow your environment.yml, the gpu i used is one single v-100 with memory 16g, and the cuda version is 9.0

Subject1 Affordance Prediction precision recall f1-score support

 movable     0.6735    0.6770    0.6753      3093

stationary 0.8646 0.9352 0.8985 18192 reachable 0.5947 0.4782 0.5301 3388 pourable 0.9146 0.7049 0.7962 471 pourto 0.9133 0.7155 0.8024 471 containable 0.8333 0.5070 0.6305 641 drinkable 0.7982 0.3321 0.4691 274 openable 0.6867 0.8271 0.7504 538 placeable 0.8258 0.6962 0.7555 2574 closeable 0.5977 0.8270 0.6939 185 cleanable 0.8500 0.7556 0.8000 135 cleaner 0.7628 0.8815 0.8179 135

accuracy                         0.8115     30097

macro avg 0.7763 0.6948 0.7183 30097 weighted avg 0.8062 0.8115 0.8045 30097

Affordance Recognition precision recall f1-score support

 movable     0.7885    0.7585    0.7732      3632

stationary 0.8911 0.9621 0.9253 20368 reachable 0.6668 0.6044 0.6341 2447 pourable 0.7764 0.6769 0.7232 554 pourto 0.8615 0.6625 0.7490 554 containable 0.8667 0.3668 0.5154 319 drinkable 0.9944 0.4972 0.6630 360 openable 0.8609 0.9308 0.8945 1243 placeable 0.8097 0.4417 0.5716 2033 closeable 0.7769 0.9302 0.8467 659 cleanable 0.8192 0.8824 0.8496 493 cleaner 0.8728 0.8073 0.8388 493

accuracy                         0.8556     33155

macro avg 0.8321 0.7101 0.7487 33155 weighted avg 0.8521 0.8556 0.8472 33155

Sub-activity Prediction precision recall f1-score support

reaching     0.7314    0.7747    0.7524      3484
  moving     0.6911    0.7409    0.7152      3470
 pouring     0.9389    0.7834    0.8542       471
  eating     0.4752    0.6000    0.5303       335
drinking     0.9394    0.4526    0.6108       274
 opening     0.6817    0.8439    0.7542       538
 placing     0.8192    0.7292    0.7716      2578
 closing     0.5152    0.9135    0.6589       185
    null     0.9812    0.6482    0.7807       884
cleaning     0.8780    0.8000    0.8372       135

accuracy                         0.7405     12354

macro avg 0.7651 0.7286 0.7265 12354 weighted avg 0.7581 0.7405 0.7423 12354

Sub-activity Recognition precision recall f1-score support

reaching     0.6407    0.6857    0.6624      2507
  moving     0.7167    0.7779    0.7460      4305
 pouring     0.8264    0.6444    0.7241       554
  eating     0.3146    0.2463    0.2763       272
drinking     0.9570    0.4944    0.6520       360
 opening     0.8667    0.9204    0.8927      1243
 placing     0.8200    0.5017    0.6225      2025
 closing     0.7907    0.8998    0.8417       659
    null     0.5927    0.7701    0.6699      1357
cleaning     0.7965    0.8337    0.8147       493

accuracy                         0.7172     13775

macro avg 0.7322 0.6774 0.6902 13775 weighted avg 0.7285 0.7172 0.7128 13775

F1@0.1 metric. Affordance Prediction F1@0.1: 0.8846 Affordance Recognition F1@0.1: 0.8900 Sub-activity Prediction F1@0.1: 0.8835 Sub-activity Recognition F1@0.1: 0.8555

F1@0.25 metric. Affordance Prediction F1@0.25: 0.8199 Affordance Recognition F1@0.25: 0.8569 Sub-activity Prediction F1@0.25: 0.8341 Sub-activity Recognition F1@0.25: 0.8155

F1@0.5 metric. Affordance Prediction F1@0.5: 0.7040 Affordance Recognition F1@0.5: 0.7697 Sub-activity Prediction F1@0.5: 0.6673 Sub-activity Recognition F1@0.5: 0.6556

Subject3 Affordance Prediction precision recall f1-score support

 movable     0.7465    0.7839    0.7647      3276

stationary 0.9194 0.9446 0.9318 21249 reachable 0.8011 0.7047 0.7498 3126 pourable 0.9551 0.9075 0.9307 843 pourto 0.9356 0.3962 0.5567 843 containable 0.8481 0.9685 0.9043 444 drinkable 0.3759 0.8175 0.5150 378 openable 0.8345 0.6595 0.7367 558 placeable 0.9143 0.8579 0.8852 2787 closeable 0.8483 0.5857 0.6930 210 cleanable 0.8889 0.8767 0.8828 146 cleaner 0.9085 0.9521 0.9298 146

accuracy                         0.8772     34006

macro avg 0.8313 0.7879 0.7900 34006 weighted avg 0.8838 0.8772 0.8760 34006

Affordance Recognition precision recall f1-score support

 movable     0.6887    0.8634    0.7662      4700

stationary 0.9619 0.9198 0.9404 23798 reachable 0.7993 0.7437 0.7705 2431 pourable 0.8864 0.9739 0.9281 729 pourto 0.8793 0.8395 0.8589 729 containable 0.8399 0.8371 0.8385 307 drinkable 0.8719 0.8020 0.8355 399 openable 0.8733 0.9286 0.9001 995 placeable 0.8233 0.8010 0.8120 1995 closeable 0.8850 0.9542 0.9183 742 cleanable 0.9815 0.8514 0.9118 249 cleaner 0.9912 0.9076 0.9476 249

accuracy                         0.8928     37323

macro avg 0.8735 0.8685 0.8690 37323 weighted avg 0.9009 0.8928 0.8951 37323

Sub-activity Prediction precision recall f1-score support

reaching     0.8132    0.8464    0.8295      3126
  moving     0.6897    0.8276    0.7524      4409
 pouring     0.9628    0.8909    0.9254       843
  eating     0.6458    0.0945    0.1649       656
drinking     0.4497    0.6746    0.5397       378
 opening     0.8095    0.6703    0.7333       558
 placing     0.8505    0.8665    0.8584      2750
 closing     0.6286    0.5238    0.5714       210
    null     0.8037    0.5429    0.6480      1516
cleaning     0.8794    0.8493    0.8641       146

accuracy                         0.7660     14592

macro avg 0.7533 0.6787 0.6887 14592 weighted avg 0.7715 0.7660 0.7539 14592

Sub-activity Recognition precision recall f1-score support

reaching     0.7807    0.7602    0.7703      2431
  moving     0.8212    0.7819    0.8011      5956
 pouring     0.8483    0.9739    0.9068       729
  eating     0.2222    0.0093    0.0179       645
drinking     0.7367    0.7995    0.7668       399
 opening     0.8847    0.9407    0.9118       995
 placing     0.8302    0.8345    0.8324      1934
 closing     0.8617    0.9569    0.9068       742
    null     0.5972    0.7937    0.6816      2109
cleaning     0.9822    0.8876    0.9325       249

accuracy                         0.7842     16189

macro avg 0.7565 0.7738 0.7528 16189 weighted avg 0.7705 0.7842 0.7710 16189

F1@0.1 metric. Affordance Prediction F1@0.1: 0.9128 Affordance Recognition F1@0.1: 0.9354 Sub-activity Prediction F1@0.1: 0.8804 Sub-activity Recognition F1@0.1: 0.9092

F1@0.25 metric. Affordance Prediction F1@0.25: 0.8899 Affordance Recognition F1@0.25: 0.9208 Sub-activity Prediction F1@0.25: 0.8597 Sub-activity Recognition F1@0.25: 0.8890

F1@0.5 metric. Affordance Prediction F1@0.5: 0.8312 Affordance Recognition F1@0.5: 0.8841 Sub-activity Prediction F1@0.5: 0.7866 Sub-activity Recognition F1@0.5: 0.8206

Subject4 Affordance Prediction precision recall f1-score support

 movable     0.8134    0.6816    0.7417      3050

stationary 0.9113 0.9349 0.9230 13127 reachable 0.8339 0.5166 0.6380 2625 pourable 0.7676 0.9585 0.8525 627 pourto 0.7104 0.6220 0.6633 627 containable 0.8692 0.8014 0.8339 423 drinkable 0.3216 0.8179 0.4617 302 openable 0.6095 0.7682 0.6797 453 placeable 0.7865 0.9087 0.8432 2169 closeable 0.7978 0.4641 0.5868 153 cleanable 0.4955 0.9565 0.6528 115 cleaner 0.4070 0.9130 0.5630 115

accuracy                         0.8362     23786

macro avg 0.6936 0.7786 0.7033 23786 weighted avg 0.8506 0.8362 0.8350 23786

Affordance Recognition precision recall f1-score support

 movable     0.6501    0.9442    0.7700      3422

stationary 0.9406 0.9095 0.9248 15072 reachable 0.7360 0.7144 0.7250 1936 pourable 1.0000 0.2750 0.4314 600 pourto 1.0000 0.3150 0.4791 600 containable 0.8551 0.6344 0.7284 279 drinkable 0.8182 0.9643 0.8852 336 openable 0.8484 0.7241 0.7813 1167 placeable 0.8178 0.7802 0.7986 1415 closeable 0.7331 0.7694 0.7508 464 cleanable 0.8713 0.9670 0.9167 546 cleaner 0.9025 0.9322 0.9171 546

accuracy                         0.8536     26383

macro avg 0.8477 0.7441 0.7590 26383 weighted avg 0.8716 0.8536 0.8496 26383

Sub-activity Prediction precision recall f1-score support

reaching     0.7978    0.7444    0.7702      2656
  moving     0.8213    0.6771    0.7423      3475
 pouring     0.8074    0.6954    0.7472       627
  eating     0.9667    0.2843    0.4394       306
drinking     0.3149    0.7914    0.4505       302
 opening     0.6464    0.7506    0.6946       453
 placing     0.7059    0.8735    0.7808      2135
 closing     0.7161    0.7255    0.7208       153
    null     0.9415    0.8098    0.8707       736
cleaning     0.3630    0.8522    0.5091       115

accuracy                         0.7394     10958

macro avg 0.7081 0.7204 0.6726 10958 weighted avg 0.7770 0.7394 0.7442 10958

Sub-activity Recognition precision recall f1-score support

reaching     0.7662    0.7005    0.7318      1993
  moving     0.7147    0.8961    0.7952      3986
 pouring     1.0000    0.4350    0.6063       600
  eating     0.7778    0.1167    0.2029       240
drinking     0.7489    0.9762    0.8475       336
 opening     0.8531    0.7763    0.8129      1167
 placing     0.7803    0.8637    0.8199      1357
 closing     0.6294    0.8858    0.7359       464
    null     0.8453    0.5094    0.6357      1598
cleaning     0.9348    0.9451    0.9399       546

accuracy                         0.7654     12287

macro avg 0.8050 0.7105 0.7128 12287 weighted avg 0.7831 0.7654 0.7534 12287

F1@0.1 metric. Affordance Prediction F1@0.1: 0.9090 Affordance Recognition F1@0.1: 0.9235 Sub-activity Prediction F1@0.1: 0.9073 Sub-activity Recognition F1@0.1: 0.9047

F1@0.25 metric. Affordance Prediction F1@0.25: 0.8903 Affordance Recognition F1@0.25: 0.9059 Sub-activity Prediction F1@0.25: 0.8650 Sub-activity Recognition F1@0.25: 0.8612

F1@0.5 metric. Affordance Prediction F1@0.5: 0.8058 Affordance Recognition F1@0.5: 0.8087 Sub-activity Prediction F1@0.5: 0.7861 Sub-activity Recognition F1@0.5: 0.7745

Subject5 Affordance Prediction precision recall f1-score support

 movable     0.7911    0.6984    0.7419      4619

stationary 0.8613 0.9602 0.9080 20669 reachable 0.6766 0.4622 0.5492 4355 pourable 0.6583 0.6475 0.6529 488 pourto 0.7162 0.6516 0.6824 488 containable 0.7094 0.5907 0.6447 562 drinkable 0.7473 0.7381 0.7427 653 openable 0.4080 0.4411 0.4239 603 placeable 0.8326 0.8085 0.8203 2872 closeable 0.7902 0.6750 0.7281 240 cleanable 1.0000 0.5186 0.6830 295 cleaner 0.8647 0.6068 0.7131 295

accuracy                         0.8195     36139

macro avg 0.7546 0.6499 0.6909 36139 weighted avg 0.8118 0.8195 0.8103 36139

Affordance Recognition precision recall f1-score support

 movable     0.7656    0.7940    0.7795      4674

stationary 0.9070 0.9770 0.9407 24299 reachable 0.6040 0.6011 0.6025 2913 pourable 1.0000 0.6756 0.8064 1048 pourto 1.0000 0.6069 0.7553 1048 containable 0.7533 0.3831 0.5079 295 drinkable 0.7089 0.9592 0.8153 612 openable 0.7873 0.4744 0.5921 1389 placeable 0.7851 0.7434 0.7636 1769 closeable 0.9711 0.8799 0.9233 916 cleanable 0.9833 0.6155 0.7571 671 cleaner 0.9967 0.4501 0.6201 671

accuracy                         0.8619     40305

macro avg 0.8552 0.6800 0.7387 40305 weighted avg 0.8642 0.8619 0.8556 40305

Sub-activity Prediction precision recall f1-score support

reaching     0.6955    0.6916    0.6935      4355
  moving     0.6397    0.7656    0.6970      5141
 pouring     0.6549    0.6844    0.6693       488
  eating     0.7113    0.3473    0.4667       596
drinking     0.7861    0.6753    0.7265       653
 opening     0.3838    0.3615    0.3723       603
 placing     0.8304    0.8130    0.8216      2872
 closing     0.8902    0.6417    0.7458       240
    null     0.9167    0.5464    0.6847       765
cleaning     1.0000    0.5186    0.6830       295

accuracy                         0.7001     16008

macro avg 0.7508 0.6046 0.6561 16008 weighted avg 0.7122 0.7001 0.6979 16008

Sub-activity Recognition precision recall f1-score support

reaching     0.6254    0.6739    0.6487      2913
  moving     0.7512    0.7784    0.7646      5772
 pouring     1.0000    0.6784    0.8084      1048
  eating     0.2943    0.6512    0.4054       324
drinking     0.6817    0.9624    0.7981       612
 opening     0.8333    0.5004    0.6253      1389
 placing     0.7467    0.7880    0.7668      1769
 closing     0.9372    0.8799    0.9077       916
    null     0.8021    0.8223    0.8121      2617
cleaning     0.9600    0.5365    0.6883       671

accuracy                         0.7417     18031

macro avg 0.7632 0.7271 0.7225 18031 weighted avg 0.7653 0.7417 0.7439 18031

F1@0.1 metric. Affordance Prediction F1@0.1: 0.9074 Affordance Recognition F1@0.1: 0.9239 Sub-activity Prediction F1@0.1: 0.8622 Sub-activity Recognition F1@0.1: 0.8837

F1@0.25 metric. Affordance Prediction F1@0.25: 0.8549 Affordance Recognition F1@0.25: 0.8884 Sub-activity Prediction F1@0.25: 0.8189 Sub-activity Recognition F1@0.25: 0.8347

F1@0.5 metric. Affordance Prediction F1@0.5: 0.7298 Affordance Recognition F1@0.5: 0.7683 Sub-activity Prediction F1@0.5: 0.6479 Sub-activity Recognition F1@0.5: 0.6963

Summary Performance for Cross-validation. affordance_prediction-micro_precision Values: [0.8115, 0.8772, 0.8362, 0.8195] Mean: 0.8361 Std: 0.0253 affordance_prediction-micro_recall Values: [0.8115, 0.8772, 0.8362, 0.8195] Mean: 0.8361 Std: 0.0253 affordance_prediction-micro_f1 Values: [0.8115, 0.8772, 0.8362, 0.8195] Mean: 0.8361 Std: 0.0253 affordance_prediction-macro_precision Values: [0.7763, 0.8313, 0.6936, 0.7546] Mean: 0.7640 Std: 0.0493 affordance_prediction-macro_recall Values: [0.6948, 0.7879, 0.7786, 0.6499] Mean: 0.7278 Std: 0.0578 affordance_prediction-macro_f1 Values: [0.7183, 0.79, 0.7033, 0.6909] Mean: 0.7256 Std: 0.0384 affordance_recognition-micro_precision Values: [0.8556, 0.8928, 0.8536, 0.8619] Mean: 0.8660 Std: 0.0158 affordance_recognition-micro_recall Values: [0.8556, 0.8928, 0.8536, 0.8619] Mean: 0.8660 Std: 0.0158 affordance_recognition-micro_f1 Values: [0.8556, 0.8928, 0.8536, 0.8619] Mean: 0.8660 Std: 0.0158 affordance_recognition-macro_precision Values: [0.8321, 0.8735, 0.8477, 0.8552] Mean: 0.8521 Std: 0.0149 affordance_recognition-macro_recall Values: [0.7101, 0.8685, 0.7441, 0.68] Mean: 0.7507 Std: 0.0717 affordance_recognition-macro_f1 Values: [0.7487, 0.869, 0.759, 0.7387] Mean: 0.7788 Std: 0.0525 sub-activity_prediction-micro_precision Values: [0.7405, 0.766, 0.7394, 0.7001] Mean: 0.7365 Std: 0.0235 sub-activity_prediction-micro_recall Values: [0.7405, 0.766, 0.7394, 0.7001] Mean: 0.7365 Std: 0.0235 sub-activity_prediction-micro_f1 Values: [0.7405, 0.766, 0.7394, 0.7001] Mean: 0.7365 Std: 0.0235 sub-activity_prediction-macro_precision Values: [0.7651, 0.7533, 0.7081, 0.7508] Mean: 0.7443 Std: 0.0216 sub-activity_prediction-macro_recall Values: [0.7286, 0.6787, 0.7204, 0.6046] Mean: 0.6831 Std: 0.0491 sub-activity_prediction-macro_f1 Values: [0.7265, 0.6887, 0.6726, 0.6561] Mean: 0.6860 Std: 0.0261 sub-activity_recognition-micro_precision Values: [0.7172, 0.7842, 0.7654, 0.7417] Mean: 0.7521 Std: 0.0252 sub-activity_recognition-micro_recall Values: [0.7172, 0.7842, 0.7654, 0.7417] Mean: 0.7521 Std: 0.0252 sub-activity_recognition-micro_f1 Values: [0.7172, 0.7842, 0.7654, 0.7417] Mean: 0.7521 Std: 0.0252 sub-activity_recognition-macro_precision Values: [0.7322, 0.7565, 0.805, 0.7632] Mean: 0.7642 Std: 0.0262 sub-activity_recognition-macro_recall Values: [0.6774, 0.7738, 0.7105, 0.7271] Mean: 0.7222 Std: 0.0347 sub-activity_recognition-macro_f1 Values: [0.6902, 0.7528, 0.7128, 0.7225] Mean: 0.7196 Std: 0.0225

Summary F1@k results. affordance_prediction Overlap: 0.1 Values: [0.8846, 0.9128, 0.909, 0.9074] Mean: 0.9034 Std: 0.0110

Overlap: 0.25
Values: [0.8199, 0.8899, 0.8903, 0.8549]
Mean: 0.8638    Std: 0.0291

Overlap: 0.5
Values: [0.704, 0.8312, 0.8058, 0.7298]
Mean: 0.7677    Std: 0.0524

affordance_recognition Overlap: 0.1 Values: [0.89, 0.9354, 0.9235, 0.9239] Mean: 0.9182 Std: 0.0170

Overlap: 0.25
Values: [0.8569, 0.9208, 0.9059, 0.8884]
Mean: 0.8930    Std: 0.0238

Overlap: 0.5
Values: [0.7697, 0.8841, 0.8087, 0.7683]
Mean: 0.8077    Std: 0.0470

sub-activity_prediction Overlap: 0.1 Values: [0.8835, 0.8804, 0.9073, 0.8622] Mean: 0.8833 Std: 0.0161

Overlap: 0.25
Values: [0.8341, 0.8597, 0.865, 0.8189]
Mean: 0.8444    Std: 0.0188

Overlap: 0.5
Values: [0.6673, 0.7866, 0.7861, 0.6479]
Mean: 0.7220    Std: 0.0647

sub-activity_recognition Overlap: 0.1 Values: [0.8555, 0.9092, 0.9047, 0.8837] Mean: 0.8883 Std: 0.0212

Overlap: 0.25
Values: [0.8155, 0.889, 0.8612, 0.8347]
Mean: 0.8501    Std: 0.0277

Overlap: 0.5
Values: [0.6556, 0.8206, 0.7745, 0.6963]
Mean: 0.7367    Std: 0.0646

RomeroBarata / human_object_interaction

performance gap i reproduced compared with yours #10