How to integrate OOD metrics

google-research / robustness_metrics

Apache License 2.0

463 stars 33 forks source link

How to integrate OOD metrics #3

Closed batzner closed 3 years ago

batzner commented 3 years ago

I would like to extend the code to also evaluate the performance on images that are completely out-of-distribution. For that, the user would need to be able to specify both an in-distribution dataset (e.g., CIFAR-10) and an out-of-distribution dataset (e.g., SVHN).

Currently, the code is designed for pairs of one metric and one dataset (accuracy@imagenet, brier@imagenet). What is the best approach to extending it so that the user can specify two datasets for a metric, such as aucroc@cifar10&svhn. Did you already consider that scenario? Where would be a good point for me to start?

josipd commented 3 years ago

Hello! We have the concept of a report that should cover this scenario:

https://github.com/google-research/robustness_metrics/blob/b4c4d4d19e3ef0f2dde5aa44292f82eddeb86906/robustness_metrics/reports/base.py#L68

You can specify that you need measurements on both datasets, and then combine them. Would this work?

batzner commented 3 years ago

Hi @josipd, thank you for the quick answer and the idea of combining datasets in a report. In the case of the AUC-ROC, I think that the AUC metric instance would need to be called with predictions from both the in-distribution and the OOD dataset through:

https://github.com/google-research/robustness_metrics/blob/b4c4d4d19e3ef0f2dde5aa44292f82eddeb86906/robustness_metrics/metrics/base.py#L48

If I understand it correctly, the metric would otherwise not be able to return a single float value when its result method is called:

https://github.com/google-research/robustness_metrics/blob/b4c4d4d19e3ef0f2dde5aa44292f82eddeb86906/robustness_metrics/metrics/base.py#L57

What would be a good approach for combining a metric with two datasets? Or do you think there is an alternative way?

josipd commented 3 years ago

I see! The easiest would be to create a union dataset that loads and concatenates the two wrapped datasets. For this you can use https://www.tensorflow.org/api_docs/python/tf/data/Dataset#concatenate

batzner commented 3 years ago

That makes sense. Thanks for the suggestion, I will do it this way!