Jingkang50 / OpenOOD

Benchmarking Generalized Out-of-Distribution Detection
MIT License
858 stars 108 forks source link

FPR scores of all baselines are updated? #177

Closed BierOne closed 1 year ago

BierOne commented 1 year ago

Hi Jingkang,

Thanks for this outstanding work! I have been actively working on implementing my method using OpenOOD v1.5. However, I observed a notable difference in the FPR score (available in this link) compared to the version provided on Aug. 7th. I am uncertain if there is something crucial that I might have missed in this repository.

Could you provide some clarification if there are any significant updates? Thank you so much.

zjysteven commented 1 year ago

Hi @BierOne, thank you for your continued attention!

Yes, the update we made was regarding whether we treat OOD samples as positive or negative. We found that earlier we were taking OOD samples as negative. This is not technically wrong, but typically in this case the metric would be TNR@TPR95 (e.g., see Mahalanobis paper) instead of FPR@TPR95. Also, FPR@TPR95 in this case means when 95% of ID samples are correctly classified, how many OOD samples are misclassified. It measures the performance at a threshold that emphasizes ID sample correctness (95%) rather than the correctness of OOD sample, which seems a little weird to me.

We now have updated to follow the convention in general machine learning and many OOD works (including MSP, Energy, etc), where OOD samples (anomalies) are taken as positive. In this case, FPR@TPR95 aligns better with the ultimate goal than the previous case (in my opinion): When 95% of OOD samples are correctly identified, how many ID samples are misclassified.

I hope that this clears up the confusion. Also note that this update doesn't affect the main metric AUROC.

BierOne commented 1 year ago

Got it! Thank you so much!