Develop model to label replacement modes when not filled out by user

zackAemmer commented 2 years ago

TODO:

Test models in notebook to come up with some reasonably accurate solutions.
Add a new model class similar to "TripModel" which expects aggregate data (instead of individual user ids).
Implement the aggregate model in the data processing pipeline, saving the model parameters periodically.
Use the model to periodically fill in replacement labels for users who do not label them in their trips.

Model Storage: https://github.com/e-mission/e-mission-server/pull/874

Model Structure: https://github.com/e-mission/e-mission-server/pull/852

zackAemmer commented 2 years ago

Currently have tested Random Forest, Multinomial/Mixed Logit, and Gradient Boosted Decision Tree models for replacement labeling on all labeled data: https://github.com/zackAemmer/em-public-dashboard/blob/trb-analysis/viz_scripts/biogeme_test.ipynb

Performance is about the same for the MNL/MXL and the Gradient Boosted Decision Tree ~75% accuracy across all classes, ~71% F1 score when weighted by class support. Given the simplicity of the implementation for the Sklearn models (GBDT/RF) it makes the most sense to implement those in the pipeline, while keeping things modular enough to plug in other models such as the MNL/MXL in the future.

This notebook also contains many "replacement" visualizations that were used in the TRB paper. These may be useful for the dashboard when it is configured in "program" (not yet implemented https://github.com/e-mission/e-mission-docs/issues/781).

zackAemmer commented 1 year ago

The testing notebook is now here: https://github.com/e-mission/em-public-dashboard/pull/76

And the implementation is here: https://github.com/e-mission/e-mission-server/pull/890

Both are work in progress, but mostly on the implementation.

e-mission / e-mission-docs

Develop model to label replacement modes when not filled out by user #783