Closed perolavsvendsen closed 9 months ago
Sudo Refinement 20.11.2023:
_fmu_provider.py
: replace regex matching and getting info from path with info from env varsFor reference, here is an example of the information ERT gives as env variables while running: When starting:
_ERT_EXPERIMENT_ID: 6a8e1e0f-9315-46bb-9648-8de87151f4c7
_ERT_ENSEMBLE_ID: b027f225-c45d-477d-8f33-73695217ba14
_ERT_SIMULATION_MODE: test_run
During forward model:
_ERT_EXPERIMENT_ID: 6a8e1e0f-9315-46bb-9648-8de87151f4c7
_ERT_ENSEMBLE_ID: b027f225-c45d-477d-8f33-73695217ba14
_ERT_SIMULATION_MODE: test_run
_ERT_ITERATION_NUMBER: 0
_ERT_REALIZATION_NUMBER: 0
_ERT_RUNPATH: /scratch/fmu/jriv/01_drogon_ahm/realization-0/iter-0/
I believe we find the following via the path today: user
, casename
, iter ID
, real ID
. But user
and casename
is not in the environment vars then?
I believe we find the following via the path today:
user
,casename
,iter ID
,real ID
. Butuser
andcasename
is not in the environment vars then?
No I cannot find them. Made a custom forward job that spits all env variables. But none of the other seems to relate to the ERT run. Hence we are stuck...
@sondreso any chance of adding USER
and CASENAME
(I guess you prefer not to call it that, but OK) to these environment variables?
Based on offline discussions: ERT does not have a built-in definition of casename
. Nor user
. So we may not be able to get these via the environment right now.
But, can we not pick these from the case metadata? When we create case metadata, we inject the casename
and user
as arguments to the workflow. We already use case metadata for other purposes.
There is also an issue with the iteration
concept which makes this tricky. From ERT via environment variable we currently get an integer (iteration.id
). However, this only exists in some cases and it breaks when more than one prediction run in the same case. So following changes a while back, all logic is now placed on iteration.name
when using these data. So at the end of the day, the only usable information from the currently exposed environment variables is probably _ERT_REALIZATION_NUMBER
which maps to fmu.realization.id
.
However, it is possible and perhaps a step in the right direction to replace the parsing of the scripts runpath with parsing of the exposed _ERT_RUNPATH
instead and derive fmu.iteration.name
+ fmu.iteration.id
from this variable. This will allow us to remove the current parsing of the file path. Although a tiny step, it feels like one in the right direction.
Suggested usage of currently available information then:
_ERT_RUNPATH
➡️ fmu.iteration.name
/ fmu.iteration.id
_ERT_REALIZATION_NUMBER
➡️ fmu.realization.id
ERT config ➡️ case metadata ➡️ fmu.case.uuid
/fmu.case.name
/fmu.case.user.id
Alternatively: We discuss with ERT to get some (or all) variables from config exposed as environment variables and do this by convention instead. E.g. require a variable named CASE_NAME
, ITERATION_NAME
or similar. All of these things are defined in the config, they just aren't available outside the config except by passing them as arguments to FORWARD_JOBs or similar. @sondreso thoughts on this?
_ERT_REALIZATION_NUMBER
➡️fmu.iteration.id
Should probably be _ERT_REALIZATION_NUMBER
➡️ fmu.realization.id
?
Alternatively: We discuss with ERT to get some (or all) variables from config exposed as environment variables and do this by convention instead. E.g. require a variable named
CASE_NAME
,ITERATION_NAME
or similar. All of these things are defined in the config, they just aren't available outside the config except by passing them as arguments to FORWARD_JOBs or similar. @sondreso thoughts on this?
The major argument against doing this, is that all jobs will have access to all defines in the ERT config, essentially making all variables global. This will in turn make it impossible to validate arguments passed to models up front, since any job may depend on information that it retrieves from these environment variables.
Should probably be...
Thanks, fixed ✅
Alternatively: We discuss with ERT to...
The major argument against doing this, is that all jobs will have access to all defines in the ERT config, essentially making all variables global. This will in turn make it impossible to validate arguments passed to models up front, since any job may depend on information that it retrieves from these environment variables.
Will exposing some (pre-defined) defines come with the same problems? E.g. create convention for the specific variable names in question, and expose those?
(Slippery slope, perhaps.)
Should probably be...
Thanks, fixed ✅
Alternatively: We discuss with ERT to...
The major argument against doing this, is that all jobs will have access to all defines in the ERT config, essentially making all variables global. This will in turn make it impossible to validate arguments passed to models up front, since any job may depend on information that it retrieves from these environment variables.
Will exposing some (pre-defined) defines come with the same problems? E.g. create convention for the specific variable names in question, and expose those?
(Slippery slope, perhaps.)
Can we set some pre-defines environment variable? Obviously, this will need to be well documented in fmu-dataio. We did something like this to link prediction case to the AHM case so that fmu-dataio can have UUID of RESTART_CASE.
For reference, here is an example of the information ERT gives as env variables while running: When starting:
_ERT_EXPERIMENT_ID: 6a8e1e0f-9315-46bb-9648-8de87151f4c7 _ERT_ENSEMBLE_ID: b027f225-c45d-477d-8f33-73695217ba14 _ERT_SIMULATION_MODE: test_run
During forward model:
_ERT_EXPERIMENT_ID: 6a8e1e0f-9315-46bb-9648-8de87151f4c7 _ERT_ENSEMBLE_ID: b027f225-c45d-477d-8f33-73695217ba14 _ERT_SIMULATION_MODE: test_run _ERT_ITERATION_NUMBER: 0 _ERT_REALIZATION_NUMBER: 0 _ERT_RUNPATH: /scratch/fmu/jriv/01_drogon_ahm/realization-0/iter-0/
@sondreso is it possible to provide something like this before the forward model?:
_ERT_ROOTPATH: /scratch/fmu/jriv/01_drogon_ahm/
This is great, i would like to add that i think would be super useful if we had a _ERT_STATE
(name up discussion) that indicates what state ert is in starting/forward/ others? Atm. its by interment by lack of envs. i believe that will quickly get complex if we ever want to track more states.
This is great, i would like to add that i think would be super useful if we had a
_ERT_STATE
(name up discussion) that indicates what state ert is in starting/forward/ others? Atm. its by interment by lack of envs. i believe that will quickly get complex if we ever want to track more states.
I think we currently can live with: _ERT_EXPERIMENT_ID present, _ERT_RUNPATH not present >> STARTUP _ERT_EXPERIMENT_ID present, _ERT_RUNPATH present >> FORWARD MODELS (realizations running)
This is great, i would like to add that i think would be super useful if we had a
_ERT_STATE
(name up discussion) that indicates what state ert is in starting/forward/ others? Atm. its by interment by lack of envs. i believe that will quickly get complex if we ever want to track more states.I think we currently can live with: _ERT_EXPERIMENT_ID present, _ERT_RUNPATH not present >> STARTUP _ERT_EXPERIMENT_ID present, _ERT_RUNPATH present >> FORWARD MODELS (realizations running)
My point is: if we want to add another state this logic starts to get convoluted.
Ref https://github.com/equinor/ert/issues/4904
As far as I can see, this is now implemented and available.
fmu-dataio
should tap into this stream and include some of it in outgoing metadata, to establish this pattern.Suggested first steps: