e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Replace mode pipeline integration #841

Open aGuttman opened 1 year ago

aGuttman commented 1 year ago

Add the replace mode model into analysis pipeline.

model: https://github.com/e-mission/e-mission-server/pull/890 incremental interface for pipeline: https://github.com/e-mission/e-mission-server/pull/852 pipeline description (chapter 5): https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-180.pdf

aGuttman commented 1 year ago

Ok. I feel like I might be missing something.

Looking at the pipeline trying to understand where I need to make a change to get the new model to run as part of it:

If the model Zach wrote works, using it in the pipeline is just a matter of selecting it in the config so inferrers.py can use when it reads through config.py. Now that I think I understand the how to pipeline is using the models, it was set up to handle new models being added. The infrastructure is already there. I feel like I might be missing the point of this task because it looks like I have almost nothing to do.

Am I understanding what I'm being asked to do correctly? I feel like that only other thing to do is add tests similar to the other trip model, but Zach has written tests as well.

shankari commented 1 year ago

pipeline.infer_labels() generates inferred labels - the ones that are based on prior labels from users Zack is writing a separate model, one to infer the replaced mode for trips that don't have any labels. So you will need a step similar to pipeline.infer_labels() but that runs a different model and generates and stores a different label.

Also, you have outlined where the model is applied, where we use the model to predict values for incoming trips. You also need to build and save the replaced mode model so that it can be applied in the future.

At a higher level, what does the intake pipeline do and what do the steps represent?

aGuttman commented 1 year ago

More time spent looking over the wiring of how label prediction works on inferred labels, how the model is built, saved, loaded.

Created a new file, equivalent to pipeline.py for replace model. Wired up to use trip model infrastructure with GBDT model. Built functions through inference chain to use GBDT model, with updated algorithm id tags as needed. Built storage retrieval functions for model based on existing functions. Updated to reflect use of one model across users rather than one per user.

To be done: Update/save model functions Testing