cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 9 forks source link

[Discussion] A general framework for handling multi-ahead forecasters #262

Open dshemetov opened 11 months ago

dshemetov commented 11 months ago

This includes:

dajmcdon commented 10 months ago

This is a very complicated issue for a variety of reasons:

  1. Some engines "work", in that they allow for multiple columns in the response. lm() is one such.
    • But by work in quotations, you don't actually want to do this because NA's in the response for large aheads would propagate at train time to earlier aheads, resulting in partially missing recent data when there actually is none.
    • And even this is not predictable. So switching the engine might just bomb.
    • The best thing is to loop over aheads and then bind the predictions together. But this is actually multiple workflows, not just multiple predictions. So how to handle the S3 objects? Could possibly be a [workflowset](https://workflowsets.tidymodels.org) which is the tidymodels analogue of the fable::modeltable.
  2. Other forecasters CAN ONLY operate with multiple aheads.
    • Iterated Autoregressive
    • Smoothed Quantile Reg (although the current implementation only allows the quantreg engine, this could be extended to many other engines).
  3. Smooth QR Needs to know the desired Aheads at Train time. But Iterated Autoregressive only needs it at prediction time.
  4. The smooth QR actually suggests a different way to solve 1: build up a Kronecker matrix of features and concatenate responses into a long vector. Then drop missing values. I can imagine this fixing most of the issues in 1, but it may also have unintended consequences for some engines that I can't foresee. And the resulting model object will be more difficult to inspect.