Open mdietze opened 8 years ago
@mdietze adding this functionality to the downscale script that's nearly ready for a pull request. Here is the name generated for each ensemble, "MACA.dwnsc.ens1.2006.nc". How does that look? The file that was downscaled is titled "MACA.IPSL-CM5A-LR.rcp85.r1i1p1.2006.nc", should I add the MACA specific information to the downscale ensemble member as well?
A bit long, but yes, you should probably add the specification of what GCM, RCP, etc. was run to the prefix.
after today's meetings, just wanted to tag people who are thinking about met ensembles and met uncertainty here again for joining efforts and furthering discussions @mdietze @araiho @ankurdesai @mollyaufforth
Yesterday @bcow, @Luke-Dramko, and I had a chat where we revisited this design, and reconsidered whether there was a better option that would 'conserve' input ID's better given that the numerical weather forecasting example would mint 21 new input ID's per site per 6hr (~30k/yr) for every site that we set up real-time forecasting for (though right now we're only planning on daily, and NEON + Willow Creek as the sites). For now we didn't find one. We also flushed out some of the details more of what needs to be implemented for the numerical weather forecast met example and who that intersects with:
<ensemble>
- defaults to FALSE for the case where a met product provides a single time series; If a met product provides an ensemble, set to the ensemble size<forecast>
- defaults to FALSE, setting to TRUE changes how date conflicts are handled%Y
(4 digit year) and there's not end time, but for numerical weather forecasts we also need to write out the month, day, and time the forecast was made (and let's stick to ISO standard formats please!)NOAA_GEFS
download should return files that are something like NOAA_GEFS.[site].[ens].[start_datetime].[end_datetime].nc
<ensemble>
<settings>
object should have a list of <met>
entries within <run><inputs>
. If written out to xml it would look something like:
<run>
<inputs>
<met>
<id> 1234 </id>
</met>
<met>
<id> 1235 </id>
</met>
<met>
<id> 1236 </id>
</met>
</inputs>
</run>
This should also be a valid way to specify ensemble inputs in the pecan.xml at the start, or for any other inputs that we might ensemblize (IC/veg, soils, etc)
met.process
, I've already spoken to @istfer and @para2x about the fact that we need to refactor the SDA code and to take a close look at run.write.configs
to make sure that the code for generating model ensembles is modularized/shared between these two -- conceptually there should be no difference between ensembles started by the main workflow and those started by SDA, and indeed SDA should be able to pick up a general ensemble run and perform an Analysis and reforecast on it.This issue is stale because it has been open 365 days with no activity.
PEcAn needs the ability to handle ensembles of inputs as a way of capturing uncertainties in drivers and initial conditions.
The following comes from a discussion with @araiho and @tonygardella at the PalEON SDA Hackathon
Meteorology
Focusing first on the meteorology as the first case we want to deal with.
Key parts of the proposed design:
The above proposal would NOT be compatible with a previous discussion about the potential capacity to not store ensemble members but to be able to generate them on the fly having saved the seed. This comes at a cost of increased disk storage but much simpler and less error prone provenance & repeatability.
Within met.process we envision two major use cases:
1) the input meteorology is itself an ensemble
within met.process we're basically just looping over each ensemble member
2) met.process itself generates the ensemble
For example, we would download one met, CF one met, generate N ensemble members within gapfill/downscale (so we'd end up with a whole list of results that would result in a whole vector of input.id's), then we'd loop over every ensemble member when calling met2model (so met2model would have no idea it's processing an ensemble member).
write.configs
We'll need to update the ensemble option (code and settings) to let you choose WHAT you want an ensemble of (just params, just select inputs, both parameters and inputs). This would pass a specific list of inputs to each write.config.[model], so the model code doesn't need to know anything about ensembles
SDA
split.inputs.[MODEL]
split.inputs will take a new argument, ensemble number, which will default to 1
For each input, choose the modulus of the ensemble number. For example, if there are 50 met drivers and 5 soil drivers and ens = 48, then we use met ensemble member 48 and soil ensemble member 48%%5 = 3. Then proceeds to do any split as before. Returns an list of inputs where each input only has that ensemble member's drivers.
The code calling split.inputs will loop over the ensemble sample vector and save a whole list of input lists
write.configs
Loop over ensemble members should just need inputs to be changed to inputs[[i]] to make the inputs ensemble member specific.