p-values for bootstrapped performance comparison

What plots._protected_subplots makes box plots for model performance per protected class. It should also give a p-value for whether the performance across classes is the same.

Why What we ultimately want to know is whether the performance is different across classes. You can get an idea of that from the box plots, but it's unclear what conclusion to draw from them without a p-value.

How Figure out what p-value to calculate then make a helper function to calculate it in plots.py. Currently performance is evaluated in plots._bootstrap_performance and plots._performance_by_index.

Acceptance Criteria plots._protected_subplots calculates and displays p-values for performance across classes being the same.

broadinstitute / ml4h

p-values for bootstrapped performance comparison #376