Ensemble IDs are used to distinguish a group of related model runs. To date they are used to keep track of runs associated with ensemble analysis, sensitivity analysis, and parameter data assimilation (EA, SA, PDA).
There's generally been at most one of each of these analyses per workflow, and so the ens.id has been kept with the EA, SA, and PDA blocks of pecan.xml. But this isn't ideal, because those blocks really keep track of the settings for the analysis, which might be applied to multiple ensembles. In particular, as we implement multi-site workflows, you might want to define the SA (for example) settings once, but apply it to each site separately. After a lot of back and forth, here's some mods to the existing settings structure that Mike and I propose:
Ensemble IDs will now belong to a particular <run> block in pecan.xml. The <EA/SA/PDA> blocks will still store the same settings they do now, but when PEcAn generates ensemble IDs in the process of conducting these analyses, it will stick them in the appropriate <run> block (again, there may be several per workflow now), rather than in the "analysis" block
The <EA/SA/PDA> blocks will get a new analysis attribute that acts as a simple counter, in case you end up with multiple analyses of the same type in the same workflow. E.g. if for some reason you've done an SA already, but want to conduct a new one with different settings. The ensemble IDs stored within a <run> will then be able to refer to the particular analysis associated with a given ensemble by combination of ensemble type / count. I.e., The analysis block will have something like:
As a bonus, without much trouble we could have write.configs() and the EA/SA/PDA functions loop over all analyses in the settings object, but only run the ones that haven't been done yet. E.g. if and EA calls for 10 model runs per <run> block, first check whether each <run> already has an ensemble.id for this analysis, and whether that ensemble already has 10 "runs" complete*; if so, move on. In this way, you can add new analyses or extend existing ones (e.g. add another 10 iterations to your EA but include in existing ensemble) without repeating anything that's already been done, but without manually deleting <EA/SA/PDA> blocks associated with completed analyses.
* As you may have noticed, we could use a new name for the <run> block. "run" already refers to a single model run, whereas <run> in pecan.xml stores information that generally applies to many model runs (multiple ensembles, in fact, as we've just discussed). Currently that's site and input settings and start/end dates, but it might include <model> in the future for the purpose of multi-model comparison. Candidates welcome.
Ensemble IDs are used to distinguish a group of related model runs. To date they are used to keep track of runs associated with ensemble analysis, sensitivity analysis, and parameter data assimilation (EA, SA, PDA).
There's generally been at most one of each of these analyses per workflow, and so the ens.id has been kept with the EA, SA, and PDA blocks of
pecan.xml
. But this isn't ideal, because those blocks really keep track of the settings for the analysis, which might be applied to multiple ensembles. In particular, as we implement multi-site workflows, you might want to define the SA (for example) settings once, but apply it to each site separately. After a lot of back and forth, here's some mods to the existingsettings
structure that Mike and I propose:<run>
block in pecan.xml. The<EA/SA/PDA>
blocks will still store the same settings they do now, but when PEcAn generates ensemble IDs in the process of conducting these analyses, it will stick them in the appropriate<run>
block (again, there may be several per workflow now), rather than in the "analysis" block<EA/SA/PDA>
blocks will get a newanalysis
attribute that acts as a simple counter, in case you end up with multiple analyses of the same type in the same workflow. E.g. if for some reason you've done an SA already, but want to conduct a new one with different settings. The ensemble IDs stored within a<run>
will then be able to refer to the particular analysis associated with a given ensemble by combination of ensemble type / count. I.e., The analysis block will have something like:and a block might have something like
write.configs()
and the EA/SA/PDA functions loop over all analyses in thesettings
object, but only run the ones that haven't been done yet. E.g. if and EA calls for 10 model runs per<run>
block, first check whether each<run>
already has anensemble.id
for this analysis, and whether that ensemble already has 10 "runs" complete*; if so, move on. In this way, you can add new analyses or extend existing ones (e.g. add another 10 iterations to your EA but include in existing ensemble) without repeating anything that's already been done, but without manually deleting<EA/SA/PDA>
blocks associated with completed analyses.* As you may have noticed, we could use a new name for the
<run>
block. "run" already refers to a single model run, whereas<run>
in pecan.xml stores information that generally applies to many model runs (multiple ensembles, in fact, as we've just discussed). Currently that's site and input settings and start/end dates, but it might include<model>
in the future for the purpose of multi-model comparison. Candidates welcome.