Closed dfsnow closed 5 months ago
@dfsnow I think steps 1 and 2 should only take a day or two, as long as we're only setting or updating a metadata parameter that isn't used anywhere in the pipeline itself. I have less of a sense of how long step 3 will take but I assume it should be on the scale of minutes once we know how we want to tag each model.
Model runs can have different purposes, such as testing a new feature, finalizing an idea, etc. We don't currently have a way to identify the run "type" for a given model. As such, it can be difficult to know which model runs are actually worthy of consideration given the volume of runs we produce. @ccao-jardine proposes the following model typology:
junk
- there is good reason to not use it in productiontest
- broadly, any model that could be considered for productionrejected
- a test model that we've looked at and rejectedcandidate
- a test model that we've evaluated and determined should be a top candidatebaseline
- a "standard" run used to compare to other runsfinal
- the model that year, selected by Valuations/the AssessorWe should update our pipeline to use this typology. This would involve three distinct tasks:
run_type
parameter in the manualworkflow_dispatch
. This can follow the same pattern as the feature flags, such ascv_enable
, etc. The result would be each model run receiving one type from the types above.metadata
parquet file. This can follow a UX pattern similar to thedelete-model-run
workflow.The model tags can be stored in the repurposed
run_type
field in themodel.metadata
table. This was previously used to differentiate "full" model runs (where everything ran) from "limited" runs (which only ran prediction on the test set, to enable runs on CI runners). Therun_type
was an input to some conditional logic that determined what exactly ran, but that logic is not longer used and it's not necessary to store the original type.