PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 231 forks source link

Met and input ensembles: adjustments to do.conversions, met.process, write.configs, and sda.enkf #1167

Open mdietze opened 7 years ago

mdietze commented 7 years ago

PEcAn needs the ability to handle ensembles of inputs as a way of capturing uncertainties in drivers and initial conditions.

The following comes from a discussion with @araiho and @tonygardella at the PalEON SDA Hackathon

Meteorology

Focusing first on the meteorology as the first case we want to deal with.

Key parts of the proposed design:

  1. ensemble members will be named prefix.ens[ID] (e.g a CF met ensemble would be prefix.ens[ID].[year].nc). Therefore, most of the existing functions that will have to deal with ensembles will simply need to be called in a loop and internally they will have no idea that they're dealing with an ensemble member, they'll just treat the prefix.ens[ID] as the prefix.
  2. Each ensemble member will be inserted into the database individually, not as one entry for the whole ensemble. This seems to make implementation easier and improves provenance (we'll know exactly which ensemble member was used for each run).

The above proposal would NOT be compatible with a previous discussion about the potential capacity to not store ensemble members but to be able to generate them on the fly having saved the seed. This comes at a cost of increased disk storage but much simpler and less error prone provenance & repeatability.

Within met.process we envision two major use cases:

1) the input meteorology is itself an ensemble

within met.process we're basically just looping over each ensemble member

2) met.process itself generates the ensemble

For example, we would download one met, CF one met, generate N ensemble members within gapfill/downscale (so we'd end up with a whole list of results that would result in a whole vector of input.id's), then we'd loop over every ensemble member when calling met2model (so met2model would have no idea it's processing an ensemble member).

write.configs

We'll need to update the ensemble option (code and settings) to let you choose WHAT you want an ensemble of (just params, just select inputs, both parameters and inputs). This would pass a specific list of inputs to each write.config.[model], so the model code doesn't need to know anything about ensembles

SDA

split.inputs.[MODEL]

split.inputs will take a new argument, ensemble number, which will default to 1

For each input, choose the modulus of the ensemble number. For example, if there are 50 met drivers and 5 soil drivers and ens = 48, then we use met ensemble member 48 and soil ensemble member 48%%5 = 3. Then proceeds to do any split as before. Returns an list of inputs where each input only has that ensemble member's drivers.

The code calling split.inputs will loop over the ensemble sample vector and save a whole list of input lists

write.configs

Loop over ensemble members should just need inputs to be changed to inputs[[i]] to make the inputs ensemble member specific.

jsimkins2 commented 7 years ago

@mdietze adding this functionality to the downscale script that's nearly ready for a pull request. Here is the name generated for each ensemble, "MACA.dwnsc.ens1.2006.nc". How does that look? The file that was downscaled is titled "MACA.IPSL-CM5A-LR.rcp85.r1i1p1.2006.nc", should I add the MACA specific information to the downscale ensemble member as well?

mdietze commented 7 years ago

A bit long, but yes, you should probably add the specification of what GCM, RCP, etc. was run to the prefix.

istfer commented 6 years ago

after today's meetings, just wanted to tag people who are thinking about met ensembles and met uncertainty here again for joining efforts and furthering discussions @mdietze @araiho @ankurdesai @mollyaufforth

mdietze commented 6 years ago

Yesterday @bcow, @Luke-Dramko, and I had a chat where we revisited this design, and reconsidered whether there was a better option that would 'conserve' input ID's better given that the numerical weather forecasting example would mint 21 new input ID's per site per 6hr (~30k/yr) for every site that we set up real-time forecasting for (though right now we're only planning on daily, and NEON + Willow Creek as the sites). For now we didn't find one. We also flushed out some of the details more of what needs to be implemented for the numerical weather forecast met example and who that intersects with:

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.