developmentseed / geospatial-ds-cholera-lab

A repo dedicated to developing a geospatial data science prototype (see issue: https://github.com/developmentseed/labs/issues/292)
10 stars 2 forks source link

Kb/imputed smote imbalance #28

Open kathrynberger opened 11 months ago

kathrynberger commented 11 months ago

This PR reflects two notebooks in the space of model exploration

  1. explore-missing-feature-obs.ipynb explores the extent of missing data for all variables. In the latter half of the notebook, it explores methods for imputation of the missing data (which was later determined not to be the best approach - but saved for topic discussion)
  2. model-exploration.ipynb takes the full dataset and goes through the model exploration process: (1) correlation of environmental parameters, (2) dealing with an imbalanced dataset, and (3) model exploration - determine which model is most appropriate by evaluation metrics