Closed stevehadd closed 3 years ago
scikit learn bootstapping as part of cross validation: https://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/modules/generated/sklearn.cross_validation.Bootstrap.html
alternatively, the pandas sample method could be used for bootstrapping. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html
We could use this to even out class support for a variety of features:
Some more ways to look at links between features https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab
various ways to do feature selection more systematically using sklearn or pandas:
Permutation feature importance algorithm implementation: https://scikit-learn.org/stable/modules/permutation_importance.html
This has been implemeted in various notebooks and updated for the batch code in PR #111.
Based on discussion with Francesco, here are some additional things we could try for additional analysis to add to the paper to make a stronger case for how we have chosen to do things: