eco4cast / unconf-2023

Brainstorming repo to propose and discuss unconference project ideas!
12 stars 0 forks source link

Stategies for automated repeatable forecast workflows #7

Open cboettig opened 1 year ago

cboettig commented 1 year ago

Automating forecast generation is basically essential to having iterative forecasts that assimilate new data as it is released and produce updated predictions. A group could explore / document / compare different strategies for doing this. For instance:

juniperlsimonis commented 1 year ago

I would love to work on this.

I'm thinking a lot about the topic in generalizing the Portal Forecasting workflow with portalcasting and the associated tools under https://portal.naturecast.org/.

we already have a yaml workflow for the data structures that get built that is built in to the package. I'm in the process of building a yaml workflow for models as well -- it's a work in progress but i'm quite close to getting it where you can submit a new model using a yaml workflow that facilitates inclusion of user-provided code.

So yeah, count me in on this!

mdietze commented 1 year ago

I think a challenge here is what constitutes a "model". Some are single equations or simple function calls. Others are external executables with hundreds of thousands of lines of C or Fortran code. Conceptually I think both could be handled by the same workflow if the models are containerized, but a key thing that we're missing are standards on the INPUTS into such containers (settings, parameters, drivers, initial conditions, etc) and agreement on how that information is passed in. Output is much easier since we already have a community standard for output files and metadata. Another level of complexity is added when iterative forecasts need their own outputs handed back to them as inputs, and there's also a question of whether models that make use of data assimilation should have the assimilation code inside their model container or in a separate module. The former is simpler, but I think ultimately less helpful as it both requires a lot of reinvention of wheels and reduces the scalability of the system (e.g. if I want to run a forecast across 1000 locations, having the assimilation outside of the model container allows me to spin up an arbitrary number of duplicate model containers to handle 1000 sites).

juniperlsimonis commented 1 year ago

for sure.

we actually just went through that "what is a model?" situation with portalcasting and i feel really confident that the structure we landed on is flexible enough to handle all the situations you outlined (and more complicated versions).

there are models that use single line calls to existing package functions (like auto.arima, etc) as well as calls out to jags models that have extensive scripts.

v 0.53.0 of the package is now in production,

the models individually don't need to be containerized as the whole system is. -- all the code is packaged up into portalcasting and its docker container, which allow anyone to spin-up a version with all models ready to run.

certainly, we could containerize each model and then the whole system, but that seems overkill, since we can wrap everything into an R package and leverage dependency management accordingly. certainly that could translate over into whatever program one is using to run the CI system, as long as it's friendly like R to wrapping up external code into R functions.

we've got the inputs standardized to the extent that i think we'll want (for this system) as part of the model controls file, which has a fairly loose standard but i would be happy to help formalize it or something based off of it.

i've generalized it so that we can start adding models with qualitatively different targets using the identical system. that means the inputs are formalized at a very very general level (e.g., each of the fit and forecast elements have an R function name and arguments if needed), allowing for each model to work within its self-defined needs

i get what you're saying about the iterative input, but i that can be sidestepped by having the model's function know where to point to find what to ingest. for example, in the portalcasting models, i could set up a jags model that would read in last week's forecast output to create its model input, whether that's an MCMC table, runjags model object to be restarted, or simple parameter values doesn't matter, as long as i construct the model to grab what it needs. this is totally doable within the existing infrastructure of portalcasting, and something i will create an issue on to try and iron out before the meeting.

suffice it to say, though, there are umpteen wrinkles here, so hashing out the problem space will be key, so we can bite off workable chunks for the meeting and beyond

rqthomas commented 1 year ago

This seems related to #3

robbinscalebj commented 1 year ago

Some of us in the Theory working group have been using the forecast templates linked above by @cboettig (Issue #18). I'd be interested in improving that documentation and seeing how other models/modeling frameworks could fit in or link up with portalcasting workflow, which looks awesome.