Replace mode pipeline integration

aGuttman commented 1 year ago

Add the replace mode model into analysis pipeline.

model: https://github.com/e-mission/e-mission-server/pull/890 incremental interface for pipeline: https://github.com/e-mission/e-mission-server/pull/852 pipeline description (chapter 5): https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-180.pdf

aGuttman commented 1 year ago

Ok. I feel like I might be missing something.

Looking at the pipeline trying to understand where I need to make a change to get the new model to run as part of it:

Starting with scheduler.py, calls intake_stage.run_intake_pipeline()
run_intake_pipeline() includes a step that runs analysis.classification.inference.labels.pipeline.infer_labels()
pipeline.infer_labels() selects an algorithm eacili.predict_cluster_confidence_discounting from inferrers.py
inferrers.py reads a config (through emission.analysis.modelling.trip_model.config) to pick a model and get labels from that model with eamur.predict_labels_with_n() (emission/analysis/modelling/trip_model/run_model.py)
run_model uses a trip model from an enum in model_type, where Zach has added GRADIENT_BOOSTED_DECISION_TREE
The implementation of gradient_boosted_decision_tree seems to me to match the abstract of trip_model

If the model Zach wrote works, using it in the pipeline is just a matter of selecting it in the config so inferrers.py can use when it reads through config.py. Now that I think I understand the how to pipeline is using the models, it was set up to handle new models being added. The infrastructure is already there. I feel like I might be missing the point of this task because it looks like I have almost nothing to do.

Am I understanding what I'm being asked to do correctly? I feel like that only other thing to do is add tests similar to the other trip model, but Zach has written tests as well.

shankari commented 1 year ago

pipeline.infer_labels() generates inferred labels - the ones that are based on prior labels from users Zack is writing a separate model, one to infer the replaced mode for trips that don't have any labels. So you will need a step similar to pipeline.infer_labels() but that runs a different model and generates and stores a different label.

Also, you have outlined where the model is applied, where we use the model to predict values for incoming trips. You also need to build and save the replaced mode model so that it can be applied in the future.

At a higher level, what does the intake pipeline do and what do the steps represent?

aGuttman commented 1 year ago

More time spent looking over the wiring of how label prediction works on inferred labels, how the model is built, saved, loaded.

Created a new file, equivalent to pipeline.py for replace model. Wired up to use trip model infrastructure with GBDT model. Built functions through inference chain to use GBDT model, with updated algorithm id tags as needed. Built storage retrieval functions for model based on existing functions. Updated to reflect use of one model across users rather than one per user.

To be done: Update/save model functions Testing

e-mission / e-mission-docs

Replace mode pipeline integration #841