broadinstitute / ml4h

Other
120 stars 22 forks source link

p-values for bootstrapped performance comparison #376

Open ndiamant opened 4 years ago

ndiamant commented 4 years ago

What plots._protected_subplots makes box plots for model performance per protected class. It should also give a p-value for whether the performance across classes is the same.

Why What we ultimately want to know is whether the performance is different across classes. You can get an idea of that from the box plots, but it's unclear what conclusion to draw from them without a p-value.

How Figure out what p-value to calculate then make a helper function to calculate it in plots.py. Currently performance is evaluated in plots._bootstrap_performance and plots._performance_by_index.

Acceptance Criteria plots._protected_subplots calculates and displays p-values for performance across classes being the same.

lucidtronix commented 4 years ago

Should we do Mann Whitney or Chi squared test here?