cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 9 forks source link

Adjust ahead djm #381

Closed dajmcdon closed 2 weeks ago

dajmcdon commented 2 weeks ago

@dsweber2 This is the idea I had in mind. It probably breaks checks. But perhaps we can iterate on this.

dsweber2 commented 2 weeks ago

Alright done! Checks pass locally, even if they won't run here. Side bonus, this adds a label for trained steps that have had their ahead/lag modified and includes the modified version instead.

dajmcdon commented 2 weeks ago

I'm ok if you want to merge this now and continue to iterate on the big PR?

Two things:

  1. The edits to prep.epi_recipe() will not persist for very long. The intention is for this to look like https://github.com/cmu-delphi/epipredict/blob/6530c7e3ca75c7f6a2f76f8c61f9a5c5d537452e/R/epi_recipe.R#L226.
  2. There's some attribute setting/tracking. Is that necessary? My intention here was to avoid doing that.
dsweber2 commented 2 weeks ago

In that case the changes you're asking for in this PR are impossible. We're adding a feature (metadata) that's unsupported by base recipes, so we need to modify either the prep or bake functions for our use case if we're not going to rely on the user not calling steps that drop metadata between when its made and when its used.

dsweber2 commented 2 weeks ago

re 2, we could add logic that detects if the step is a shift and attaches the metadata to it instead, but I think that ends up being more fragile/error prone than adding it to the epi_df. It moves the logic for a particular step into prep.epi_recipe, rather than into that step.

dajmcdon commented 2 weeks ago

I think we want the logic in prep.epi_recipe(). That way, it can be processed either before or after dispatch to prep.recipe() (this is simlar to the adjust_recipe_latency_before_bake() logic). This avoids having to duplicate the guts prep.recipe with only minor changes.

It also means that we can avoid setting attributes. Because it's in prep.epi_recipe() it can access the entire recipe and directly modify the ahead/lag steps if needed.

dsweber2 commented 2 weeks ago

I see, in that case it's still technically possible, if quite a headache; probably moreso than keeping the custom loop would be.

If we do it after, we have to throw out and recompute var_info, term_info , and last_term_info. Any steps that depend on training having the right shifts after will probably be wrong (step_naomit comes to mind), so we should recompute all of those. This means restarting the prep/bake process at step_adjust_latency so we're doing the second half of the loop anyway.

To do it beforehand, you have to run through all the previous steps to get the state to be correct to know which columns exist and what their latencies are, so we'd be doing the first half of the for loop, and then applying the logic.

Both of these sound like significantly more work than maintaining a lightly customized version of the for-loop of prep.epi_recipe, since both effectively require doing that anyway to write a partial execution of that loop, and then dealing with the resulting logic. Looking at the blame for prep.recipe, the function seems fairly stable at this point. It would certainly be nice if they had made their function more modular so it was calling functions instead of duplicating exact logic.

dajmcdon commented 2 weeks ago

OK, in that case, I'd rather avoid modifying the prep/bake internals and return to the old implementation based on attribute munging. Do you want to close this and grab off anything useful for the other branch?