MetOffice / XBTs_classification

Project for the classification of eXpendable Bathy Thermographs
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

Additional analysis for paper #103

Closed stevehadd closed 3 years ago

stevehadd commented 3 years ago

Based on discussion with Francesco, here are some additional things we could try for additional analysis to add to the paper to make a stronger case for how we have chosen to do things:

stevehadd commented 3 years ago

scikit learn bootstapping as part of cross validation: https://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/modules/generated/sklearn.cross_validation.Bootstrap.html

alternatively, the pandas sample method could be used for bootstrapping. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html

We could use this to even out class support for a variety of features:

stevehadd commented 3 years ago

Some more ways to look at links between features https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab

stevehadd commented 3 years ago

various ways to do feature selection more systematically using sklearn or pandas:

stevehadd commented 3 years ago

Permutation feature importance algorithm implementation: https://scikit-learn.org/stable/modules/permutation_importance.html

stevehadd commented 3 years ago

Docs for more metrics:

stevehadd commented 3 years ago

This has been implemeted in various notebooks and updated for the batch code in PR #111.