Jwoo5 / ecg-qa

Official repository for distributing ECG-QA dataset
Creative Commons Attribution 4.0 International
46 stars 2 forks source link

Unable to recreate the results for UpperBound Experiments #3

Closed parthagrawal02 closed 1 week ago

parthagrawal02 commented 1 month ago

Hey, thanks for the code and the QA dataset.

I am trying to reproduce the results in the paper for upperbound experiments, using the PTB-XL dataset, following every step as mentioned in the README, but I am getting poorer results as compared to the reported results. Is there anything that I am missing? For SE-WRN model - I am getting Macro Averaged AUC of 0.824.

Thanks

Jwoo5 commented 1 month ago

Hi,

Could you let me know how you calculated the macro-AUROC? Did you get AUROCs for each of 83 attributes and macro-average them?

parthagrawal02 commented 1 month ago

It logs macro AUROC doesn't it? I checked the code, it averages across all of the 83 attributes.

criterion: _name: multi_head_binary_cross_entropy report_auc: true log_per_class: true per_log_keys: [attribute_id]

Example: "test_auroc": "0.820812",

[2024-07-21 20:48:16,450][test][INFO] - {"epoch": 22, "test_loss": "1.091", "test_nsignals": "125.077", "test_accuracy": "0.77429", "test_cls_73_accuracy": "0.83333", "test_cls_11_accuracy": "0.66538", "test_cls_61_accuracy": "0.83153", "test_cls_52_accuracy": "0.75223", "test_cls_53_accuracy": "0.78659", "test_cls_30_accuracy": "0.88333", "test_cls_81_accuracy": "0.9", "test_cls_24_accuracy": "0.98333", "test_cls_37_accuracy": "0.88084", "test_cls_63_accuracy": "0.9", "test_cls_29_accuracy": "0.83333", "test_cls_20_accuracy": "0.86667", "test_cls_67_accuracy": "0.96667", "test_cls_51_accuracy": "0.81667", "test_cls_69_accuracy": "0.69744", "test_cls_55_accuracy": "0.74854", "test_cls_3_accuracy": "0.78333", "test_cls_14_accuracy": "0.94444", "test_cls_21_accuracy": "0.96667", "test_cls_16_accuracy": "0.79615", "test_cls_35_accuracy": "0.80392", "test_cls_6_accuracy": "0.98333", "test_cls_46_accuracy": "0.72222", "test_cls_74_accuracy": "0.8", "test_cls_13_accuracy": "0.91667", "test_cls_76_accuracy": "0.78333", "test_cls_25_accuracy": "0.9", "test_cls_23_accuracy": "0.70142", "test_cls_45_accuracy": "0.71698", "test_cls_18_accuracy": "0.88333", "test_cls_26_accuracy": "0.81667", "test_cls_68_accuracy": "0.76667", "test_cls_38_accuracy": "0.70846", "test_cls_49_accuracy": "0.71833", "test_cls_12_accuracy": "0.56667", "test_cls_22_accuracy": "0.78", "test_cls_2_accuracy": "0.81667", "test_cls_19_accuracy": "0.83333", "test_cls_31_accuracy": "0.89474", "test_cls_36_accuracy": "0.88333", "test_cls_57_accuracy": "1", "test_cls_27_accuracy": "0.82456", "test_cls_58_accuracy": "0.88333", "test_cls_10_accuracy": "0.94", "test_cls_79_accuracy": "0.78333", "test_cls_42_accuracy": "0.9", "test_cls_48_accuracy": "0.76429", "test_cls_78_accuracy": "0.76667", "test_cls_4_accuracy": "0.73333", "test_cls_65_accuracy": "0.88333", "test_cls_33_accuracy": "0.82759", "test_cls_70_accuracy": "0.71533", "test_cls_56_accuracy": "0.71667", "test_attribute_id_73_accuracy": "0.83333", "test_attribute_id_11_accuracy": "0.66538", "test_attribute_id_61_accuracy": "0.83153", "test_attribute_id_52_accuracy": "0.75223", "test_attribute_id_53_accuracy": "0.78659", "test_attribute_id_30_accuracy": "0.88333", "test_attribute_id_81_accuracy": "0.9", "test_attribute_id_24_accuracy": "0.98333", "test_attribute_id_37_accuracy": "0.88084", "test_attribute_id_63_accuracy": "0.9", "test_attribute_id_29_accuracy": "0.83333", "test_attribute_id_20_accuracy": "0.86667", "test_attribute_id_67_accuracy": "0.96667", "test_attribute_id_51_accuracy": "0.81667", "test_attribute_id_69_accuracy": "0.69744", "test_attribute_id_55_accuracy": "0.74854", "test_attribute_id_3_accuracy": "0.78333", "test_attribute_id_14_accuracy": "0.94444", "test_attribute_id_21_accuracy": "0.96667", "test_attribute_id_16_accuracy": "0.79615", "test_attribute_id_35_accuracy": "0.80392", "test_attribute_id_6_accuracy": "0.98333", "test_attribute_id_46_accuracy": "0.72222", "test_attribute_id_74_accuracy": "0.8", "test_attribute_id_13_accuracy": "0.91667", "test_attribute_id_76_accuracy": "0.78333", "test_attribute_id_25_accuracy": "0.9", "test_attribute_id_23_accuracy": "0.70142", "test_attribute_id_45_accuracy": "0.71698", "test_attribute_id_18_accuracy": "0.88333", "test_attribute_id_26_accuracy": "0.81667", "test_attribute_id_68_accuracy": "0.76667", "test_attribute_id_38_accuracy": "0.70846", "test_attribute_id_49_accuracy": "0.71833", "test_attribute_id_12_accuracy": "0.56667", "test_attribute_id_22_accuracy": "0.78", "test_attribute_id_2_accuracy": "0.81667", "test_attribute_id_19_accuracy": "0.83333", "test_attribute_id_31_accuracy": "0.89474", "test_attribute_id_36_accuracy": "0.88333", "test_attribute_id_57_accuracy": "1", "test_attribute_id_27_accuracy": "0.82456", "test_attribute_id_58_accuracy": "0.88333", "test_attribute_id_10_accuracy": "0.94", "test_attribute_id_79_accuracy": "0.78333", "test_attribute_id_42_accuracy": "0.9", "test_attribute_id_48_accuracy": "0.76429", "test_attribute_id_78_accuracy": "0.76667", "test_attribute_id_4_accuracy": "0.73333", "test_attribute_id_65_accuracy": "0.88333", "test_attribute_id_33_accuracy": "0.82759", "test_attribute_id_70_accuracy": "0.71533", "test_attribute_id_56_accuracy": "0.71667", "test_cls_8_accuracy": "0.63333", "test_cls_7_accuracy": "0.83333", "test_cls_17_accuracy": "0.81667", "test_cls_64_accuracy": "0.7", "test_cls_54_accuracy": "0.83333", "test_cls_43_accuracy": "0.78333", "test_cls_80_accuracy": "0.78333", "test_cls_1_accuracy": "0.8", "test_cls_34_accuracy": "0.81667", "test_cls_15_accuracy": "0.88333", "test_cls_47_accuracy": "0.66667", "test_cls_40_accuracy": "0.7", "test_cls_66_accuracy": "0.73333", "test_cls_32_accuracy": "0.76667", "test_cls_59_accuracy": "0.81633", "test_attribute_id_8_accuracy": "0.63333", "test_attribute_id_7_accuracy": "0.83333", "test_attribute_id_17_accuracy": "0.81667", "test_attribute_id_64_accuracy": "0.7", "test_attribute_id_54_accuracy": "0.83333", "test_attribute_id_43_accuracy": "0.78333", "test_attribute_id_80_accuracy": "0.78333", "test_attribute_id_1_accuracy": "0.8", "test_attribute_id_34_accuracy": "0.81667", "test_attribute_id_15_accuracy": "0.88333", "test_attribute_id_47_accuracy": "0.66667", "test_attribute_id_40_accuracy": "0.7", "test_attribute_id_66_accuracy": "0.73333", "test_attribute_id_32_accuracy": "0.76667", "test_attribute_id_59_accuracy": "0.81633", "test_cls_62_accuracy": "0.82143", "test_cls_39_accuracy": "0.91667", "test_cls_77_accuracy": "0.71667", "test_cls_5_accuracy": "0.85", "test_cls_82_accuracy": "0.91667", "test_cls_28_accuracy": "0.77966", "test_cls_50_accuracy": "0.73333", "test_cls_0_accuracy": "0.8", "test_cls_60_accuracy": "0.83333", "test_cls_9_accuracy": "0.75", "test_attribute_id_62_accuracy": "0.82143", "test_attribute_id_39_accuracy": "0.91667", "test_attribute_id_77_accuracy": "0.71667", "test_attribute_id_5_accuracy": "0.85", "test_attribute_id_82_accuracy": "0.91667", "test_attribute_id_28_accuracy": "0.77966", "test_attribute_id_50_accuracy": "0.73333", "test_attribute_id_0_accuracy": "0.8", "test_attribute_id_60_accuracy": "0.83333", "test_attribute_id_9_accuracy": "0.75", "test_cls_75_accuracy": "1", "test_cls_71_accuracy": "0.8", "test_attribute_id_75_accuracy": "1", "test_attribute_id_71_accuracy": "0.8", "test_cls_72_accuracy": "0.88889", "test_attribute_id_72_accuracy": "0.88889", "test_cls_41_accuracy": "0.86667", "test_cls_44_accuracy": "0.78431", "test_attribute_id_41_accuracy": "0.86667", "test_attribute_id_44_accuracy": "0.78431", "test_num_updates": "21428", "test_best_accuracy": "0.78025", ### "test_auroc": "0.820812", "test_auprc": "0.683473", "test_cls_2_auroc": "0.8825", "test_cls_2_auprc": "0.778447", "test_cls_3_auroc": "0.93", "test_cls_3_auprc": "0.854287", "test_cls_4_auroc": "0.87", "test_cls_4_auprc": "0.781069", "test_cls_6_auroc": "1", "test_cls_6_auprc": "1", "test_cls_10_auroc": "0.9475", "test_cls_10_auprc": "0.825548", "test_cls_11_auroc": "0.656934", "test_cls_11_auprc": "0.497199", "test_cls_12_auroc": "0.61375", "test_cls_12_auprc": "0.390274", "test_cls_13_auroc": "0.9525", "test_cls_13_auprc": "0.938802", "test_cls_14_auroc": "0.966667", "test_cls_14_auprc": "0.881944", "test_cls_16_auroc": "0.865584", "test_cls_16_auprc": "0.761077", "test_cls_18_auroc": "0.98625", "test_cls_18_auprc": "0.97462", "test_cls_19_auroc": "0.92375", "test_cls_19_auprc": "0.860601", "test_cls_20_auroc": "0.91375", "test_cls_20_auprc": "0.860014", "test_cls_21_auroc": "0.99125", "test_cls_21_auprc": "0.98031", "test_cls_22_auroc": "0.8725", "test_cls_22_auprc": "0.663597", "test_cls_23_auroc": "0.674035", "test_cls_23_auprc": "0.529741", "test_cls_24_auroc": "1", "test_cls_24_auprc": "1", "test_cls_25_auroc": "1", "test_cls_25_auprc": "1", "test_cls_26_auroc": "0.8825", "test_cls_26_auprc": "0.832486", "test_cls_27_auroc": "0.975", "test_cls_27_auprc": "0.950102", "test_cls_29_auroc": "0.71405", "test_cls_29_auprc": "0.305185", "test_cls_30_auroc": "0.9175", "test_cls_30_auprc": "0.89171", "test_cls_31_auroc": "0.977941", "test_cls_31_auprc": "0.954191", "test_cls_33_auroc": "0.77913", "test_cls_33_auprc": "0.496385", "test_cls_35_auroc": "0.979545", "test_cls_35_auprc": "0.937247", "test_cls_36_auroc": "0.98", "test_cls_36_auprc": "0.966176", "test_cls_37_auroc": "0.943097", "test_cls_37_auprc": "0.869614", "test_cls_38_auroc": "0.836832", "test_cls_38_auprc": "0.632153", "test_cls_42_auroc": "0.9975", "test_cls_42_auprc": "0.995455", "test_cls_45_auroc": "0.810007", "test_cls_45_auprc": "0.614562", "test_cls_46_auroc": "0.764132", "test_cls_46_auprc": "0.634483", "test_cls_48_auroc": "0.855199", "test_cls_48_auprc": "0.729519", "test_cls_49_auroc": "0.808573", "test_cls_49_auprc": "0.634935", "test_cls_51_auroc": "0.93", "test_cls_51_auprc": "0.885417", "test_cls_52_auroc": "0.645195", "test_cls_52_auprc": "0.314701", "test_cls_53_auroc": "0.858363", "test_cls_53_auprc": "0.723025", "test_cls_55_auroc": "0.811172", "test_cls_55_auprc": "0.628535", "test_cls_56_auroc": "0.86", "test_cls_56_auprc": "0.789953", "test_cls_57_auroc": "1", "test_cls_57_auprc": "1", "test_cls_58_auroc": "0.9425", "test_cls_58_auprc": "0.910785", "test_cls_61_auroc": "0.900636", "test_cls_61_auprc": "0.768744", "test_cls_63_auroc": "0.93625", "test_cls_63_auprc": "0.922303", "test_cls_65_auroc": "0.9375", "test_cls_65_auprc": "0.832466", "test_cls_67_auroc": "0.98375", "test_cls_67_auprc": "0.971739", "test_cls_68_auroc": "0.84", "test_cls_68_auprc": "0.634047", "test_cls_69_auroc": "0.708284", "test_cls_69_auprc": "0.536556", "test_cls_70_auroc": "0.911779", "test_cls_70_auprc": "0.773581", "test_cls_73_auroc": "0.60896", "test_cls_73_auprc": "0.214599", "test_cls_74_auroc": "0.785", "test_cls_74_auprc": "0.763056", "test_cls_76_auroc": "0.862222", "test_cls_76_auprc": "0.782989", "test_cls_78_auroc": "0.845", "test_cls_78_auprc": "0.745796", "test_cls_79_auroc": "0.825", "test_cls_79_auprc": "0.721839", "test_cls_81_auroc": "0.905", "test_cls_81_auprc": "0.917668", "test_attribute_id_2_auroc": "0.8825", "test_attribute_id_2_auprc": "0.778447", "test_attribute_id_3_auroc": "0.93", "test_attribute_id_3_auprc": "0.854287", "test_attribute_id_4_auroc": "0.87", "test_attribute_id_4_auprc": "0.781069", "test_attribute_id_6_auroc": "1", "test_attribute_id_6_auprc": "1", "test_attribute_id_10_auroc": "0.9475", "test_attribute_id_10_auprc": "0.825548", "test_attribute_id_11_auroc": "0.656934", "test_attribute_id_11_auprc": "0.497199", "test_attribute_id_12_auroc": "0.61375", "test_attribute_id_12_auprc": "0.390274", "test_attribute_id_13_auroc": "0.9525", "test_attribute_id_13_auprc": "0.938802", "test_attribute_id_14_auroc": "0.966667", "test_attribute_id_14_auprc": "0.881944", "test_attribute_id_16_auroc": "0.865584", "test_attribute_id_16_auprc": "0.761077", "test_attribute_id_18_auroc": "0.98625", "test_attribute_id_18_auprc": "0.97462", "test_attribute_id_19_auroc": "0.92375", "test_attribute_id_19_auprc": "0.860601", "test_attribute_id_20_auroc": "0.91375", "test_attribute_id_20_auprc": "0.860014", "test_attribute_id_21_auroc": "0.99125", "test_attribute_id_21_auprc": "0.98031", "test_attribute_id_22_auroc": "0.8725", "test_attribute_id_22_auprc": "0.663597", "test_attribute_id_23_auroc": "0.674035", "test_attribute_id_23_auprc": "0.529741", "test_attribute_id_24_auroc": "1", "test_attribute_id_24_auprc": "1", "test_attribute_id_25_auroc": "1", "test_attribute_id_25_auprc": "1", "test_attribute_id_26_auroc": "0.8825", "test_attribute_id_26_auprc": "0.832486", "test_attribute_id_27_auroc": "0.975", "test_attribute_id_27_auprc": "0.950102", "test_attribute_id_29_auroc": "0.71405", "test_attribute_id_29_auprc": "0.305185", "test_attribute_id_30_auroc": "0.9175", "test_attribute_id_30_auprc": "0.89171", "test_attribute_id_31_auroc": "0.977941", "test_attribute_id_31_auprc": "0.954191", "test_attribute_id_33_auroc": "0.77913", "test_attribute_id_33_auprc": "0.496385", "test_attribute_id_35_auroc": "0.979545", "test_attribute_id_35_auprc": "0.937247", "test_attribute_id_36_auroc": "0.98", "test_attribute_id_36_auprc": "0.966176", "test_attribute_id_37_auroc": "0.943097", "test_attribute_id_37_auprc": "0.869614", "test_attribute_id_38_auroc": "0.836832", "test_attribute_id_38_auprc": "0.632153", "test_attribute_id_42_auroc": "0.9975", "test_attribute_id_42_auprc": "0.995455", "test_attribute_id_45_auroc": "0.810007", "test_attribute_id_45_auprc": "0.614562", "test_attribute_id_46_auroc": "0.764132", "test_attribute_id_46_auprc": "0.634483", "test_attribute_id_48_auroc": "0.855199", "test_attribute_id_48_auprc": "0.729519", "test_attribute_id_49_auroc": "0.808573", "test_attribute_id_49_auprc": "0.634935", "test_attribute_id_51_auroc": "0.93", "test_attribute_id_51_auprc": "0.885417", "test_attribute_id_52_auroc": "0.645195", "test_attribute_id_52_auprc": "0.314701", "test_attribute_id_53_auroc": "0.858363", "test_attribute_id_53_auprc": "0.723025", "test_attribute_id_55_auroc": "0.811172", "test_attribute_id_55_auprc": "0.628535", "test_attribute_id_56_auroc": "0.86", "test_attribute_id_56_auprc": "0.789953", "test_attribute_id_57_auroc": "1", "test_attribute_id_57_auprc": "1", "test_attribute_id_58_auroc": "0.9425", "test_attribute_id_58_auprc": "0.910785", "test_attribute_id_61_auroc": "0.900636", "test_attribute_id_61_auprc": "0.768744", "test_attribute_id_63_auroc": "0.93625", "test_attribute_id_63_auprc": "0.922303", "test_attribute_id_65_auroc": "0.9375", "test_attribute_id_65_auprc": "0.832466", "test_attribute_id_67_auroc": "0.98375", "test_attribute_id_67_auprc": "0.971739", "test_attribute_id_68_auroc": "0.84", "test_attribute_id_68_auprc": "0.634047", "test_attribute_id_69_auroc": "0.708284", "test_attribute_id_69_auprc": "0.536556", "test_attribute_id_70_auroc": "0.911779", "test_attribute_id_70_auprc": "0.773581", "test_attribute_id_73_auroc": "0.60896", "test_attribute_id_73_auprc": "0.214599", "test_attribute_id_74_auroc": "0.785", "test_attribute_id_74_auprc": "0.763056", "test_attribute_id_76_auroc": "0.862222", "test_attribute_id_76_auprc": "0.782989", "test_attribute_id_78_auroc": "0.845", "test_attribute_id_78_auprc": "0.745796", "test_attribute_id_79_auroc": "0.825", "test_attribute_id_79_auprc": "0.721839", "test_attribute_id_81_auroc": "0.905", "test_attribute_id_81_auprc": "0.917668", "test_cls_1_auroc": "0.81875", "test_cls_1_auprc": "0.763658", "test_cls_7_auroc": "0.88125", "test_cls_7_auprc": "0.752982", "test_cls_8_auroc": "0.6775", "test_cls_8_auprc": "0.426755", "test_cls_15_auroc": "0.9775", "test_cls_15_auprc": "0.949399", "test_cls_17_auroc": "0.855", "test_cls_17_auprc": "0.833072", "test_cls_32_auroc": "0.87625", "test_cls_32_auprc": "0.804281", "test_cls_34_auroc": "0.89", "test_cls_34_auprc": "0.854619", "test_cls_40_auroc": "0.75375", "test_cls_40_auprc": "0.657261", "test_cls_43_auroc": "0.955", "test_cls_43_auprc": "0.92042", "test_cls_47_auroc": "0.72375", "test_cls_47_auprc": "0.570795", "test_cls_54_auroc": "0.7875", "test_cls_54_auprc": "0.65625", "test_cls_59_auroc": "0.625", "test_cls_59_auprc": "0.36227", "test_cls_64_auroc": "0.65125", "test_cls_64_auprc": "0.577074", "test_cls_66_auroc": "0.7775", "test_cls_66_auprc": "0.632891", "test_cls_80_auroc": "0.745", "test_cls_80_auprc": "0.639197", "test_attribute_id_1_auroc": "0.81875", "test_attribute_id_1_auprc": "0.763658", "test_attribute_id_7_auroc": "0.88125", "test_attribute_id_7_auprc": "0.752982", "test_attribute_id_8_auroc": "0.6775", "test_attribute_id_8_auprc": "0.426755", "test_attribute_id_15_auroc": "0.9775", "test_attribute_id_15_auprc": "0.949399", "test_attribute_id_17_auroc": "0.855", "test_attribute_id_17_auprc": "0.833072", "test_attribute_id_32_auroc": "0.87625", "test_attribute_id_32_auprc": "0.804281", "test_attribute_id_34_auroc": "0.89", "test_attribute_id_34_auprc": "0.854619", "test_attribute_id_40_auroc": "0.75375", "test_attribute_id_40_auprc": "0.657261", "test_attribute_id_43_auroc": "0.955", "test_attribute_id_43_auprc": "0.92042", "test_attribute_id_47_auroc": "0.72375", "test_attribute_id_47_auprc": "0.570795", "test_attribute_id_54_auroc": "0.7875", "test_attribute_id_54_auprc": "0.65625", "test_attribute_id_59_auroc": "0.625", "test_attribute_id_59_auprc": "0.36227", "test_attribute_id_64_auroc": "0.65125", "test_attribute_id_64_auprc": "0.577074", "test_attribute_id_66_auroc": "0.7775", "test_attribute_id_66_auprc": "0.632891", "test_attribute_id_80_auroc": "0.745", "test_attribute_id_80_auprc": "0.639197", "test_cls_0_auroc": "0.7525", "test_cls_0_auprc": "0.698853", "test_cls_5_auroc": "0.9", "test_cls_5_auprc": "0.858755", "test_cls_9_auroc": "0.8225", "test_cls_9_auprc": "0.75473", "test_cls_28_auroc": "0.855263", "test_cls_28_auprc": "0.753322", "test_cls_39_auroc": "0.9725", "test_cls_39_auprc": "0.932268", "test_cls_50_auroc": "0.71125", "test_cls_50_auprc": "0.582936", "test_cls_60_auroc": "0.92625", "test_cls_60_auprc": "0.875121", "test_cls_62_auroc": "0.973437", "test_cls_62_auprc": "0.935572", "test_cls_77_auroc": "0.69625", "test_cls_77_auprc": "0.593529", "test_cls_82_auroc": "0.94625", "test_cls_82_auprc": "0.834483", "test_attribute_id_0_auroc": "0.7525", "test_attribute_id_0_auprc": "0.698853", "test_attribute_id_5_auroc": "0.9", "test_attribute_id_5_auprc": "0.858755", "test_attribute_id_9_auroc": "0.8225", "test_attribute_id_9_auprc": "0.75473", "test_attribute_id_28_auroc": "0.855263", "test_attribute_id_28_auprc": "0.753322", "test_attribute_id_39_auroc": "0.9725", "test_attribute_id_39_auprc": "0.932268", "test_attribute_id_50_auroc": "0.71125", "test_attribute_id_50_auprc": "0.582936", "test_attribute_id_60_auroc": "0.92625", "test_attribute_id_60_auprc": "0.875121", "test_attribute_id_62_auroc": "0.973437", "test_attribute_id_62_auprc": "0.935572", "test_attribute_id_77_auroc": "0.69625", "test_attribute_id_77_auprc": "0.593529", "test_attribute_id_82_auroc": "0.94625", "test_attribute_id_82_auprc": "0.834483", "test_cls_71_auroc": "0.80375", "test_cls_71_auprc": "0.755221", "test_cls_75_auroc": "1", "test_cls_75_auprc": "1", "test_attribute_id_71_auroc": "0.80375", "test_attribute_id_71_auprc": "0.755221", "test_attribute_id_75_auroc": "1", "test_attribute_id_75_auprc": "1", "test_cls_72_auroc": "0.977778", "test_cls_72_auprc": "0.916667", "test_attribute_id_72_auroc": "0.977778", "test_attribute_id_72_auprc": "0.916667", "test_cls_41_auroc": "0.93375", "test_cls_41_auprc": "0.835404", "test_cls_44_auroc": "0.863636", "test_cls_44_auprc": "0.662103", "test_attribute_id_41_auroc": "0.93375", "test_attribute_id_41_auprc": "0.835404", "test_attribute_id_44_auroc": "0.863636", "test_attribute_id_44_auprc": "0.662103"}

Jwoo5 commented 1 month ago

This criterion currently calculates MICRO AUROC by default, not MACRO. Since the current multi_head_binary_cross_entropy criterion is not supporting macro averaging option at the moment, you may need to get AUROCs for each attribute_id and manually average them.

parthagrawal02 commented 1 month ago

Ohh Got it, Very Thanks

For the LLM Modelling experiment as well, I am unable to reproduce the results, I believe these are the ones reported in Table 5.

wandb: test_sampled/question_type2_0_em_accuracy 0.64297 wandb: test_sampled/question_type2_1_em_accuracy 0.31034 wandb: test_sampled/question_type2_2_em_accuracy 0.2467 wandb: test_sampled/question_type2_3_em_accuracy 0.5461 wandb: test_sampled/question_type2_5_em_accuracy 0.04217 wandb: test_sampled/question_type2_6_em_accuracy 0.55181 wandb: test_sampled/question_type2_8_em_accuracy 0.00911

changing the thresholds from 0.5 to the Youden index here might help? Since it was mostly incorrectly classified by the classifier model. Or anything else that could have gone wrong here, I didn't make any changes to the code, other than the checkpoint.

Jwoo5 commented 1 month ago

Can you confirm if you train SE-WRN model without changing any default hyper-parameters (e.g., # of epochs, learning rate, ...)?

parthagrawal02 commented 1 month ago

Yes, didn't change any parameters, since the AUCs are good, changing the threshold might help, will try once.

parthagrawal02 commented 1 month ago

Sorry, not able to reproduce the scores, and I'm not even coming close, where do you think the problem might be?

Jwoo5 commented 1 month ago

Did you sample 10% from the test set and use them for LLM evaluation? Could you let me know which openai model you are using now?

parthagrawal02 commented 1 month ago

Yes, followed each a and every step in the README, although the openai model I am using is GPT-4o-mini, which is supposed to be better than gpt-3.5

Jwoo5 commented 1 month ago

If I share the weights file for SE-WRN that has been used for the experiments in the original paper, could you run the LLM experiments using it? For now, I have no credits to run the experiments from my end. If you are okay with it, please let me know your email address.

parthagrawal02 commented 1 month ago

Thanks, that would be great. parthagrawal02@gmail.com

Jwoo5 commented 1 month ago

I've just sent an email to you. Please check it.

parthagrawal02 commented 1 month ago

Received Thanks, will update with the result.

Jwoo5 commented 2 weeks ago

Hi, any updates on this issue? If not, I will close this issue.