Export selected run context information for other apps to parse

perolavsvendsen commented 2 years ago

Is your feature request related to a problem? Please describe. When materializing metadata for outgoing data objects, it is a wish to include information known to ERT but not easily accessible. Examples include:

Current realization
Current iteration
Case name
User

It is possible to do this today by creating a FORWARD_JOB which dumps some data to disk, but this feels like a limiting method for possible later needs. Another need possibly covered by the same mechanism is for other apps to detect if it is operating within an ERT-orchestrated run or not.

Describe the solution you'd like Alternative 1: Environment variables to be set containing this information, e.g. ERT_REALIZATION, ERT_ITERATION. Alternative 2: A file (json) to appear on the RUNPATH or with a non-ambiguous name containing selected information. For the first iteration, this could include realization and iteration id. This file would be parsed by the methods creating metadata for exported data out of the run, and contents include in said metadata. (Or both)

Describe alternatives you've considered Current alternative used is to derive this information from the file path, which is not intended as a permanent solution (and is probably not sustainable either).

Additional context https://github.com/equinor/fmu-dataio

jondequinor commented 2 years ago

In ERT next terms, you want to access opaque data resulting from evaluating an ensemble. Ideally, all of the data you need would be enumerated as outputs which would then be accessible in the event stream that ERT produces, which also contains (almost) all the data you need (iter, realisation).

Currently (legacy) ERT does not allow for easy access to the event stream, nor any information about outputs. The former is mostly a matter of exposing the right things in a good way, the latter needs some thought.

Maybe we provide a run_path output and an in-memory transmitter that produces an URI that (for legacy) points to file:///scratch/…?

In the future, when e.g. surfaces are proper outputs, gathering data is not radically different, just a matter of using different outputs, and treating them differently. The run_path output could still function in azure, but would have to be a ERT storage transmitter instead. Idk

perolavsvendsen commented 2 years ago

Yes, I assume ERT3 may offer better ways to get information out. What is needed short term however is some key information included in outgoing metadata attached to FMU results (data that are dispersed and used outside the ERT context).

This need has been around for many years, and earlier attempts has included parsing the config file itself, sending the necessary information out as arguments to a forward_model which dumped it to disk and currently we are parsing the file path. None of which are particularly great and all of which adds complexity and/or introduces risk of errors. The need remains, however, and has been emphasized lately by developments of data standards for model results and attachment of richer metadata to outgoing model results.

jondequinor commented 2 years ago

The event stream is in ERT now, so my previous comment was largely aimed at the short term.

perolavsvendsen commented 2 years ago

Add to wishlist: Total number of realizations. When evaluating, especially when aggregating model results across an ensemble, it is useful to know the amount of realizations to expect. Frequently some are missing, or subset is not adequately defined (so that two results are blended, and that the total subset exceeds the number of realizations).

perolavsvendsen commented 1 year ago

Add to wishlist: Current iteration and it's type

perolavsvendsen commented 8 months ago

Partially solved by #4904, however still missing case name and user name. Possible to add these as well?

oyvindeide commented 3 months ago

User name should be available though: $USER, and experiment name (case_name) is not available, but that can also be non-unique, are we sure that should be exposed?

perolavsvendsen commented 3 months ago

$USER: If that can be used by (pre-sim hook) workflows, that would be sufficient for fmu-dataio I believe. The particular challenge with "user" is that it can be set to anything. I.e. it does not have to be an actual user, and several assets use arbitrary strings here. Currently, there is a difference between the actual user (which we get from the OS) and the string value that has been given to ERT. The latter is (only) important because user expects to find this "user" when managing results.

casename: The original request to expose case name does not make sense, since ERT doesn't know about it. It may correspond to experiment name, but we still don't fully know if experiment == case. Most likely, the two are similar but not the same. Same same, but different... It is unclear if the experiment concept can replace the case concept fully. Plan short term is to include ert.experiment.name and other relevant information (uuid) in produced metadata for FMU results in an effort to learn a bit more about how the two correspond.

The case concept is conventional and tied to the runpath. Conventionally set by parameters such as <CASE> or <CASEDIR>, which in turn is used to build the runpath. The case concept is important for context when analyzing, storing and managing results. A case contains one or more ensembles, and data (results) that does not belong to any ensembles or realizations. When materializing the case into metadata, here are some requirements for how and when the case is defined and re-defined. E.g. producing new data to the same case location on /scratch can not create a new case.

fmu-dataio currently include the case.name, but more importantly defines the case.uuid. As you point out, the case name (as experiment name) is not unique. The case name is, however, what the user needs to find his/her results, particularly in user interfaces in various apps. The case.uuid is what is used for identification and uniqueness.

oyvindeide commented 3 months ago

So these names are mostly by convention, and ert does not really know <CASE_NAME>/<CASE_DIR> is a thing, the same applies to <USER>, but you dont want to pass it to the forward model as an argument? We had another feature request regarding the experiment name/case dir and the connection to runpath, so think we will look into doing something about this.

perolavsvendsen commented 3 months ago

Yes, they are by convention. From a results perspective, they are important context. But, as you also point out regarding experiment name - they are not unique. Hence we create a uuid per case. For this to work, this uuid must be created when a new case is made, but persisted if a new case is not made. I.e. data is appended to an existing case. The latter is where I think the main difference in behavior between experiment and case lies.

equinor / ert

Export selected run context information for other apps to parse #2359