google-deepmind / distribution_shift_framework

This repository contains the code of the distribution shift framework presented in A Fine-Grained Analysis on Distribution Shift (Wiles et al., 2022).
Apache License 2.0
80 stars 8 forks source link

Why assume all the attributes are uniformly distributed on the test set? #1

Closed nalzok closed 2 years ago

nalzok commented 2 years ago

Hi, I wonder why you assume all the attributes are uniformly distributed on the test set? Specifically, you wrote in section 2.2 that

Test distribution. We assume that the attributes are distributed uniformly. This is desirable, as all attributes are represented and a-priori independent.

Despite being desirable from a theoretical point of view, this does not seem very realistic. For example, the types of equipment may not distribute equally across all hospitals, the proportion of patients with a tumor is not necessary 50%, and we are forced to consider pregnant men if "sex" and "pregnancy" are two of the attributes.

oawiles commented 2 years ago

Hi nalzok,

Thank you for your question. In our paper we wanted to investigate how well models performed on different subgroups and so by assuming the attributes are uniformly distributed, all groups contribute similarly to the overall performance. However, as you say, for many applications this is a simplification of what is true in practice (e.g. that certain subgroups may be harder to collect data for). Our framework can be used to simulate those evaluation frameworks by manipulating the validation/test distribution and further analysis on those results could be done to determine the quality of the model across different subgroups. We did not explore this but it would be an interesting study. I hope that is helpful.

Olivia

nalzok commented 2 years ago

Thanks for the response, Olivia. Is there any chance you can share the test accuracy for each subgroup, as opposed to the averaged overall accuracy? We can study arbitrary joint distributions of attributes with that data, e.g. the following setting

Screen Shot 2022-05-16 at 16 06 39
oawiles commented 2 years ago

Hi, unfortunately we don't have this. However, the evaluation code will dump all the results at the end of the training (along with features and labels) into results.pkl so you can easily compute this for models that you train.

nalzok commented 2 years ago

Great. Thank you!