Closed dajmcdon closed 2 weeks ago
Alright done! Checks pass locally, even if they won't run here. Side bonus, this adds a label for trained steps that have had their ahead/lag modified and includes the modified version instead.
I'm ok if you want to merge this now and continue to iterate on the big PR?
Two things:
prep.epi_recipe()
will not persist for very long. The intention is for this to look like https://github.com/cmu-delphi/epipredict/blob/6530c7e3ca75c7f6a2f76f8c61f9a5c5d537452e/R/epi_recipe.R#L226.In that case the changes you're asking for in this PR are impossible. We're adding a feature (metadata) that's unsupported by base recipes, so we need to modify either the prep
or bake
functions for our use case if we're not going to rely on the user not calling steps that drop metadata between when its made and when its used.
re 2, we could add logic that detects if the step is a shift and attaches the metadata to it instead, but I think that ends up being more fragile/error prone than adding it to the epi_df. It moves the logic for a particular step into prep.epi_recipe
, rather than into that step.
I think we want the logic in prep.epi_recipe()
. That way, it can be processed either before or after dispatch to prep.recipe()
(this is simlar to the adjust_recipe_latency_before_bake()
logic). This avoids having to duplicate the guts prep.recipe
with only minor changes.
It also means that we can avoid setting attributes. Because it's in prep.epi_recipe()
it can access the entire recipe and directly modify the ahead/lag
steps if needed.
I see, in that case it's still technically possible, if quite a headache; probably moreso than keeping the custom loop would be.
If we do it after, we have to throw out and recompute var_info
, term_info
, and last_term_info
. Any steps that depend on training
having the right shifts after will probably be wrong (step_naomit
comes to mind), so we should recompute all of those. This means restarting the prep/bake process at step_adjust_latency
so we're doing the second half of the loop anyway.
To do it beforehand, you have to run through all the previous steps to get the state to be correct to know which columns exist and what their latencies are, so we'd be doing the first half of the for loop, and then applying the logic.
Both of these sound like significantly more work than maintaining a lightly customized version of the for-loop of prep.epi_recipe
, since both effectively require doing that anyway to write a partial execution of that loop, and then dealing with the resulting logic. Looking at the blame for prep.recipe, the function seems fairly stable at this point. It would certainly be nice if they had made their function more modular so it was calling functions instead of duplicating exact logic.
OK, in that case, I'd rather avoid modifying the prep/bake internals and return to the old implementation based on attribute munging. Do you want to close this and grab off anything useful for the other branch?
@dsweber2 This is the idea I had in mind. It probably breaks checks. But perhaps we can iterate on this.