equinor / fmu-dataio

FMU data standard and data export with rich metadata in the FMU context
https://fmu-dataio.readthedocs.io/en/latest/
Apache License 2.0
10 stars 14 forks source link

Metadata regarding type of ensemble #299

Open anders-kiaer opened 1 year ago

anders-kiaer commented 1 year ago

E.g. is it a

The "ensemble type" (better terminology might/probably exist) will be used by clients in order to filter out ensembles not relevant in a given analysis dashboard/pattern.

anders-kiaer commented 1 year ago

A bit more complex perhaps, but it would also be very useful to have metadata answering e.g. which history matching ensemble is a given prediction ensemble based on.

anders-kiaer commented 1 year ago

Some user stories on parent-child-linking of ensembles:

perolavsvendsen commented 1 year ago

This links to #291 perhaps

alifbe commented 1 year ago

I have tried to run drogon_pred_ref.ert and found that the results for pred_ref ensemble are uploaded to iter-0. From discussion with @perolavsvendsen apparently, dataio assumed iter-0 as default ensemble name and used them to generate ID.

See case 01_drogon_ahm_sumo in sumo prod

perolavsvendsen commented 1 year ago

Yes, we currently don't get any information on type of ensembles within a case, and as far as I know no such definition exists either.

This obviously maps to ERT quite fast. If not, we would have to do something rule-based based on iteration names.

Is there a standard (convention) on iteration names?

anders-kiaer commented 1 year ago

No strict/official rules for iteration names to my knowledge (AHM runs typically go iter-0, iter-1, ..., iter-x and prediction cases often has pred in the name, but no guarantee iter and pred are used as substrings).

I agree this information/metadata should come from ERT. Knowing the workflow method used by ERT for generating the ensemble would probably be a good start (https://ert.readthedocs.io/en/latest/reference/running_ert.html - ensemble_smoother, es_mda, iterative_ensemble_smoother, ensemble_experiment). Can/is this information exposed in some way @sondreso?

Within an assisted history matching run it would also be useful for clients of the data sets to know which ensembles are part of the same assisted history matching run (i.e. iter-0 = prior, iter-{max} = posterior, and also the order of ensembles inbetween representing gradual updates from prior to posterior...).

Pinging @asnyv in case there are details / use cases you want to mention I haven't.

perolavsvendsen commented 1 year ago

305

perolavsvendsen commented 1 year ago

https://github.com/equinor/ert/issues/2359

asnyv commented 1 year ago

Think @anders-kiaer has covered most of it, but for predictions I think we (at least me) often skip the term "pred" and use some more or less descriptive name dependent on whatever we are simulating. From a technical perspective: completely arbitrary.

Also: for a while I think it was fairly common to have a structure like:

History matching: some_ahm_case/realization-x/iter-y

Predictions:

some_prediction_scenario/realization-x
some_other_prediction_scenario/realization-x

Now it seems like more are going towards a structure where the prediction case is placed on the "iteration level" like you mention, so:

some_ahm_case/realization-x/iter-y
some_ahm_case/realization-x/some_prediction_scenario
some_ahm_case/realization-x/some_other_prediction_scenario

The advantage with the latter is that it is clearer what the basis for the prediction is, whilst the advantage of the first one is that it is easier to see all cases in a folder structure + it is a more natural structure for models without any history. But many of the models without history have now ended with the structure: some_prediction_scenario/realization-x/iter-0 as a more or less de facto standard, but I can't really say why 😅 My guess is that someone found it convenient as they could then reuse something they had hard-coded for AHM.

perolavsvendsen commented 1 year ago

Suggest we first try to accurately reflect the name in the outgoing metadata, not default to iter-0 if it looks strange. The more challenging bit is probably the iteration id, which in turn maps to the iteration uuid.

I assume that ERT internally has an iteration ID which we cannot know if the iteration is called something other than e.g. iter-0. Every single instance of fmu-dataio will (currently) look at the file structure to determine what iteration we are in. The code has been structured in such a way that it should be possible to get this from ERT when it is available, but so far it is not.

An alternative is to take away all logic placed on the iteration id and only use the iteration name.

perolavsvendsen commented 1 year ago

Drafting a possible PR, quick and dirty 👆. This needs discussions, as I am a bit unsure of the consequences. But possibly, the iteration name is a better option (outside the ERT context) than iteration id. E.g. the name is used as an identifier, not the ID. (This is similar to the current practice for the case.name, where no ID exists.

For the third similar object, realization, we are more dependent on the ID I guess.

perolavsvendsen commented 1 year ago

This is also a feature request from SSDL, ref discussions with @bous251

perolavsvendsen commented 1 year ago

368 could possibly be relevant for this