Closed tnatt closed 3 months ago
What is the desired behavior here @perolavsvendsen and @jcrivenaes ? Keep the schema definition as is and add pydantic model_validators to convert the input to a valid literal that makes sense? Or update allowed literals in the pydantic model?
And of course there might be something that I don't understand fully, as this preprocessed
and case_symlink_realization
is still a bit fuzzy to me š maybe we don't use these fields when we come to the validation part for these types.. So feel free to fill in my gaps š
Perhaps natural, but the schema matches what is actually being used. fmu.context.stage
across all data in Sumo (prod) right now:
{
"key": "iteration",
"doc_count": 9645097
},
{
"key": "realization",
"doc_count": 3123830
},
{
"key": "case",
"doc_count": 36206
}
(Answer returned in 0.15 seconds, try to do that on /scratch... š)
I guess the preprocess jobs e.g. in Drogon currently use fmu.context.stage: case
?
The point of the attribute is to flag where in the workflow a certain data has been generated. But I guess it can be questioned if it is needed or not. Also, relative terms (like "preprocessed") is tricky (pre what?) (a post-process can be a pre-process and vice versa).
This information is also implicitly available through the presence of other tags under fmu
, e.g.:
Stage: case
fmu:
case: ...
Stage: iteration
fmu:
case: ...
iteration: ...
Stage: realization
fmu:
case: ...
iteration: ...
realization: ...
So there is an implied "hierarchy" here: case -> iteration -> realization
Perhaps we for now simply keep this in synch with the presence of fmu.case
(always present), fmu.iteration
and fmu.realization
?
Something like: | stage | fmu.context.stage |
fmu contains |
Description |
---|---|---|---|---|
case |
"case" | case |
Data created on case level - not part of a specific iteration or realization. | |
iteration |
"iteration" | case , iteration |
Data created on iteration (ensemble) level, not part of a specific realization. Example: An aggregated surface | |
realization |
"realization" | case , iteration , realization |
Data created on realization level (the "standard" data export) |
Ok, investigated a bit to see how the magic happens š and seems that this preprocessed
in the metadata file will be overridden to "case" by the ERT Workflow that does SUMO_UPLOAD on preprocessed data https://github.com/equinor/fmu-drogon/blob/main/ert/bin/scripts/wf_copy_preprocessed_dataio.py. And the glue_metadata_preprocessed
function in dataio takes care of updating the necessary metadata https://github.com/equinor/fmu-dataio/blob/7cce4160afb3e75d1ab3655a07bd6926ba370ddc/src/fmu/dataio/_utils.py#L422-L438
Anyway, I support not touching the allowed literals in the pydantic model. They are very clear in their definitions š But we should maybe change it to having a model_validator setting this fmu.context.stage based on the presence of the other fields case
, iteration
, realization
... which can ensure more robustness between the given fmu.context.stage
and the rest.
For the fmu.context used inside internally in dataio I'll create a new pydantic model for that one for now and place it in datastructure._internal since it has more allowed fields.
Also I will investigate how the fmu_context=case_symlink_realization
is treated, it might also be handled in a special way in dataio and its fmu_context overridden to a standard fmu.context that the schema allows š
fmu.context.stage
generated byExportData.generate_metadata()
can have all of these enums:https://github.com/equinor/fmu-dataio/blob/7cce4160afb3e75d1ab3655a07bd6926ba370ddc/src/fmu/dataio/_definitions.py#L81-L88
But pydantic model used for schema only supports these Literals: https://github.com/equinor/fmu-dataio/blob/7cce4160afb3e75d1ab3655a07bd6926ba370ddc/src/fmu/dataio/datastructure/meta/meta.py#L327-L334
Hence users will get a validation error when e.g. using
preprocessed
orcase_symlink_realization
, like in the script below