e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Develop model to label replacement modes when not filled out by user #783

Open zackAemmer opened 2 years ago

zackAemmer commented 2 years ago

TODO:

Model Storage: https://github.com/e-mission/e-mission-server/pull/874

Model Structure: https://github.com/e-mission/e-mission-server/pull/852

zackAemmer commented 2 years ago

Currently have tested Random Forest, Multinomial/Mixed Logit, and Gradient Boosted Decision Tree models for replacement labeling on all labeled data: https://github.com/zackAemmer/em-public-dashboard/blob/trb-analysis/viz_scripts/biogeme_test.ipynb

Performance is about the same for the MNL/MXL and the Gradient Boosted Decision Tree ~75% accuracy across all classes, ~71% F1 score when weighted by class support. Given the simplicity of the implementation for the Sklearn models (GBDT/RF) it makes the most sense to implement those in the pipeline, while keeping things modular enough to plug in other models such as the MNL/MXL in the future.

This notebook also contains many "replacement" visualizations that were used in the TRB paper. These may be useful for the dashboard when it is configured in "program" (not yet implemented https://github.com/e-mission/e-mission-docs/issues/781).

zackAemmer commented 1 year ago

The testing notebook is now here: https://github.com/e-mission/em-public-dashboard/pull/76

And the implementation is here: https://github.com/e-mission/e-mission-server/pull/890

Both are work in progress, but mostly on the implementation.