developmentseed / geospatial-ds-cholera-lab

A repo dedicated to developing a geospatial data science prototype (see issue: https://github.com/developmentseed/labs/issues/292)
10 stars 2 forks source link

Wrap up and next steps #30

Open kathrynberger opened 11 months ago

kathrynberger commented 11 months ago

As of the end of September 2023 the full PoC including the following methodology (see checklist below) has been completed. ✅

The hypothesis: Environmental factors alone won’t unravel this very complex relationship, but they can help identify spatio-temporal patterns that could help assist in allocating resources and support. has been tested and there is reason to support this hypothesis. That being said, the results of the classifier model could be improved (currently its high accuracy score is reflective of majority class only).

Further work around treatment of an imbalanced dataset needs to be explored. The following treatments SMOTE, ADASYN SMOTE and TOMEK Links have been applied with a variety of sampling strategies, with varying degrees of success. A sampling strategy of 0.1 (1:10 ratio of outbreak to non-outbreak events) as has been suggested by similar work in the literature have not proven as successful. A 50:50 ratio improves the model success, but is not reflective of real world scenarios.

That being said, there are some fine-tuning and further exploration I would recommend: