Jingkang50 / OpenOOD

Benchmarking Generalized Out-of-Distribution Detection
MIT License
831 stars 106 forks source link

Potential Issues with AUPR-Out&AUPR-In Calculation in auc_and_fpr_recall Function #211

Closed pzSuen closed 8 months ago

pzSuen commented 8 months ago

Hello Jingkang,

Firstly, thank you for creating and maintaining this valuable project. I've come across a potential issue in the auc_and_fpr_recall function regarding the calculation of AUPR-out and AUPR-in that I'd like to bring to your attention for clarification or correction.

Issue Description: In the https://github.com/Jingkang50/OpenOOD/blob/main/openood/evaluators/metrics.py function, it appears that the AUPR-Out calculation is performed considering in-distribution (ID) samples as the positive class, while the AUPR-In calculation is performed considering out-of-distribution (OOD) samples as the positive class. Based on my understanding of the domain (as seen from the reference papers), and the description in the code comments, I was under the impression that AUPR-Out should be calculated with out-of-distribution (OOD) samples as the positive class, and AUPR-in should be calculated with in-distribution (ID) samples as the positive class

Relevant Code Snippet (auc_and_fpr_recall function):

def auc_and_fpr_recall(conf, label, tpr_th):
    # following convention in ML we treat OOD as positive
    ood_indicator = np.zeros_like(label)
    ood_indicator[label == -1] = 1

    # in the postprocessor we assume ID samples will have larger
    # "conf" values than OOD samples
    # therefore here we need to negate the "conf" values
    fpr_list, tpr_list, thresholds = metrics.roc_curve(ood_indicator, -conf)
    fpr = fpr_list[np.argmax(tpr_list >= tpr_th)]

    precision_in, recall_in, thresholds_in \
        = metrics.precision_recall_curve(ood_indicator, -conf)

    precision_out, recall_out, thresholds_out \
        = metrics.precision_recall_curve(1 - ood_indicator, conf)

    auroc = metrics.auc(fpr_list, tpr_list)
    aupr_in = metrics.auc(recall_in, precision_in)
    aupr_out = metrics.auc(recall_out, precision_out)

    return auroc, aupr_in, aupr_out, fpr

Reference papers: [1] [https://openaccess.thecvf.com/content_ICCV_2019/papers/Yu_Unsupervised_Out-of-Distribution_Detection_by_Maximum_Classifier_Discrepancy_ICCV_2019_paper.pdf](Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy)

[2] [https://arxiv.org/pdf/1910.10307.pdf](Detecting Out-of-Distribution Inputs in Deep Neural Networks Using an Early-Layer Output)

[3] [https://arxiv.org/pdf/2108.11941.pdf](Semantically Coherent Out-of-Distribution Detection)


I suggest reviewing this part of the code to confirm if such implementation is due to specific design considerations or if it's indeed a misunderstanding or error.

Thank you for your time and effort in this matter. I look forward to your response.

Best regards, pzSuen

zjysteven commented 8 months ago

Hi @pzSuen, thank you for pointing this out. You are right. Following the common definition the computation for aupr_in and aupr_out should be swapped to be correct. The reason for causing this was that in the beginning we treated OOD samples as negative by default, in which case the code for computing AUPR was correct. Later, as indicated in the code comments, we decided to take OOD as positive by default, following ML convention and the seminal work MSP. Yet we forgot to change the AUPR code correspondingly, which caused the mismatch you see now.

zjysteven commented 8 months ago

Fixed in #212. Feel free to reopen the issue if you find any other problems, and thank you again for raising this up.