Open anders-kiaer opened 1 year ago
A bit more complex perhaps, but it would also be very useful to have metadata answering e.g. which history matching ensemble is a given prediction ensemble based on.
Some user stories on parent-child-linking of ensembles:
RESTART
from an AHM run), and want to know the quality of the AHM-ensemble it was based on. With parent-child-linking metadata clients downstream can use this information to give the user this insight.RESTART
from an AHM run), and as a user you want the time series and/or 3D grid to extend the whole time axis (history + prediction) and not just prediction. Or calculate recovery factor (produced volume vs. initial volumes). With parent-child-linking metadata clients downstream can fetch data from both ensembles (history and prediction) and combine them in the presentation.This links to #291 perhaps
I have tried to run drogon_pred_ref.ert and found that the results for pred_ref ensemble are uploaded to iter-0. From discussion with @perolavsvendsen apparently, dataio assumed iter-0 as default ensemble name and used them to generate ID.
See case 01_drogon_ahm_sumo in sumo prod
Yes, we currently don't get any information on type of ensembles within a case, and as far as I know no such definition exists either.
This obviously maps to ERT quite fast. If not, we would have to do something rule-based based on iteration names.
Is there a standard (convention) on iteration names?
No strict/official rules for iteration names to my knowledge (AHM runs typically go iter-0
, iter-1
, ..., iter-x
and prediction cases often has pred
in the name, but no guarantee iter
and pred
are used as substrings).
I agree this information/metadata should come from ERT. Knowing the workflow method used by ERT for generating the ensemble would probably be a good start (https://ert.readthedocs.io/en/latest/reference/running_ert.html - ensemble_smoother
, es_mda
, iterative_ensemble_smoother
, ensemble_experiment
). Can/is this information exposed in some way @sondreso?
Within an assisted history matching run it would also be useful for clients of the data sets to know which ensembles are part of the same assisted history matching run (i.e. iter-0
= prior, iter-{max}
= posterior, and also the order of ensembles inbetween representing gradual updates from prior to posterior...).
Pinging @asnyv in case there are details / use cases you want to mention I haven't.
Think @anders-kiaer has covered most of it, but for predictions I think we (at least me) often skip the term "pred" and use some more or less descriptive name dependent on whatever we are simulating. From a technical perspective: completely arbitrary.
Also: for a while I think it was fairly common to have a structure like:
History matching:
some_ahm_case/realization-x/iter-y
Predictions:
some_prediction_scenario/realization-x
some_other_prediction_scenario/realization-x
Now it seems like more are going towards a structure where the prediction case is placed on the "iteration level" like you mention, so:
some_ahm_case/realization-x/iter-y
some_ahm_case/realization-x/some_prediction_scenario
some_ahm_case/realization-x/some_other_prediction_scenario
The advantage with the latter is that it is clearer what the basis for the prediction is, whilst the advantage of the first one is that it is easier to see all cases in a folder structure + it is a more natural structure for models without any history. But many of the models without history have now ended with the structure: some_prediction_scenario/realization-x/iter-0
as a more or less de facto standard, but I can't really say why 😅 My guess is that someone found it convenient as they could then reuse something they had hard-coded for AHM.
Suggest we first try to accurately reflect the name
in the outgoing metadata, not default to iter-0
if it looks strange. The more challenging bit is probably the iteration id
, which in turn maps to the iteration uuid
.
I assume that ERT internally has an iteration ID which we cannot know if the iteration is called something other than e.g. iter-0
. Every single instance of fmu-dataio will (currently) look at the file structure to determine what iteration we are in. The code has been structured in such a way that it should be possible to get this from ERT when it is available, but so far it is not.
An alternative is to take away all logic placed on the iteration id
and only use the iteration name
.
Drafting a possible PR, quick and dirty 👆. This needs discussions, as I am a bit unsure of the consequences. But possibly, the iteration name is a better option (outside the ERT context) than iteration id. E.g. the name is used as an identifier, not the ID. (This is similar to the current practice for the case.name, where no ID exists.
For the third similar object, realization
, we are more dependent on the ID I guess.
This is also a feature request from SSDL, ref discussions with @bous251
E.g. is it a
The "ensemble type" (better terminology might/probably exist) will be used by clients in order to filter out ensembles not relevant in a given analysis dashboard/pattern.