Open zackAemmer opened 2 years ago
Currently have tested Random Forest, Multinomial/Mixed Logit, and Gradient Boosted Decision Tree models for replacement labeling on all labeled data: https://github.com/zackAemmer/em-public-dashboard/blob/trb-analysis/viz_scripts/biogeme_test.ipynb
Performance is about the same for the MNL/MXL and the Gradient Boosted Decision Tree ~75% accuracy across all classes, ~71% F1 score when weighted by class support. Given the simplicity of the implementation for the Sklearn models (GBDT/RF) it makes the most sense to implement those in the pipeline, while keeping things modular enough to plug in other models such as the MNL/MXL in the future.
This notebook also contains many "replacement" visualizations that were used in the TRB paper. These may be useful for the dashboard when it is configured in "program" (not yet implemented https://github.com/e-mission/e-mission-docs/issues/781).
The testing notebook is now here: https://github.com/e-mission/em-public-dashboard/pull/76
And the implementation is here: https://github.com/e-mission/e-mission-server/pull/890
Both are work in progress, but mostly on the implementation.
TODO:
Model Storage: https://github.com/e-mission/e-mission-server/pull/874
Model Structure: https://github.com/e-mission/e-mission-server/pull/852