Open perolavsvendsen opened 7 months ago
Creating a provider tailored to Everest:
Expanding metadata definitions to accommodate Everest data:
Starting with the data definitions here (come back to the actual coding of this later?):
Perhaps we add the fmu.simulation
tag. This would enable us to identify data objects across multiple simulations within the same realization. This may also make sense for non-Everest workflows, which frequently also have more than one simulation. Today this is cumbersome, since we are left with essentially using data.name
for identifying these.
The fmu
block in the metadata gives the FMU context to produced data objects, e.g. which realization/iteration they belong to. This is what currently does not expand to the Everest use case.
The current "pattern" inside the fmu
metadata block looks something like this:
fmu:
model
case
iteration
realization | aggregation
(simplified example, each block will expand further with more information, see examples.)
...and the presence/absence of these indicates which context a data object exists in. Examples:
A data object produced inside a specific realization will have:
fmu:
model
case
iteration
realization
A data object produced across all realizations in an iteration:
fmu:
model
case
iteration
A data object on "case" level (not belonging to a specific iteration or realization, e.g. pre-processed data):
fmu:
model
case
...and so on.
Following this logic, adding a "simulation" tag:
fmu:
model
case
iteration
realization
simulation:
name: My Simulation
id: N/A
uuid: <uuid4>
simulator: My Simulator
etc: etc
etc: etc
@daniel-sol will it make sense from a SIM2SUMO
perspective to populate fmu.simulation
? I guess when reading simulation data, it will be useful to have something more tangible than just data.name
when identifying e.g. SMRY-data from more than 1 simulation per realization. For instance DST-runs that run in addition to the main simulation.
@perolavsvendsen: Yes, I think for SIM2SUMO, or for any other file produced at a given level I think it would make sense with the fmu.simulation tag, and I guess the idea is that we are separating this from a conventional fmu run with provider. But is that expressed anywhere in the metadata? Because I don't think it would be ideal if you would have to guess from the fact that you have a fmu.simulation tag that you are now in an everest context. How will this be expressed.
When it comes to sim2sumo and the data.name tag, this is is directly derived from the name of the data file for a reservoir simulator run, so you would automatically get that anyway. The way it is set up is that it removes the realization number, which it is the the current convention to include in the datafile name, meaning the unique separator between objects from different realizations is fmu.realization.id, so for distinguishing between several perturbations in an Everest context in the same realization the fmu.simulation would be the unique identifier.
The fmu.simulation
tag would be made irrespective of conventional FMU or Everest. It would allow us to get away from using the data.name
as a defacto identifier. It seems very wobbly.
The idea here would be to populate fmu.simulation
and start using that to identify a specific simulation, within a realization. And this would hopefully scale nicely to also the Everest use case.
data.name
can remain as is, but I think we should avoid using it for logic.
I would suggest following the same convention as we have done for the other tags under fmu
:
fmu:
simulation:
name: MySimulation
uuid: [hash of something, e.g. realization.uuid + simulation.name]
Then we would have a unique ID for the simulation, instead of assuming that the name is the identifier. (I can easily break that by exporting something else with the same name?)
As a user of Everest, I would like to have data produced available through an API, i.e. Sumo. so that I can utilize the produced data in any application, anywhere.
Stub, to be further described.
@roliveira, @tup1985