Closed mandresm closed 1 month ago
I have some design questions about this, in particular how it will fit into the rest of our current config structure. Currently we have the following, and some of what I write down here will need to end up in the handbook (which I nominate @chrisdane and @christian-stepanek to improve upon once I draft it, since user-facing documentation is best written by early test users)
pymorize:
# ... program metadata and settings for the program itself ...
general: # (Or global, we haven't decided the name yet)
# ... Information that will always be relevant
pipelines:
# ... collections of steps to apply ...
rules:
# ... list of rules (see below) ...
The rules
section is the main part the users will need to deal with. It is a list of dictionaries, and maps a collection of user files to a CMOR variable. A typical rule might look like this:
rules:
- cmor_variable: so
model_variable: salt
cmor_units: PSU
model_units: PSU # Can be omitted if NetCDF meta-data is complete. If given, value
# in the rules sections will win over what is in the NetCDF, always.
file_patterns:
- /a/pattern/with/fesom.salt.(Pyear/d+).*nc # Use Python-extended regex, **not**
# globbing!!!
The rules specification is a work-in-progress, and not set in stone (yet). Still to be considered are output files, variables that end up in multiple files (time aggregation), CMOR variables that depend on multiple inputs...
Sorry, I forgot to actually ask my question: I guess a Rule
will be responsible for one single DataRequestVariable
and contain all the information needed to generate that variable. Question: does that make sense, or am I overlooking some edge case
I am not quite sure but I think yes.
In this particular example I don't understand why cmor_units
should be defined in the rule, i.e. by the user (?). In my view this information should be retrieved from the cmip6-cmor-tables
repo.
Also, in the rule the cmip table that defines the variable of interest must be given, i.e. in this example Omon
or Odec
:
cd cmip6-cmor-tables/Tables
grep "\"so\":" *
CMIP6_Odec.json: "so": {
CMIP6_Omon.json: "so": {
That is just for completeness to show what kind of information will be in a rule. Not everything in the rule will be asked from the user, only ambiguous information. The actual cmor unit value will be parsed from the table, along with possibly many other things.
I think that the rules section depends only on two conditions: the variable and metadata definition as made for the MIP (CMIP7 FastTrack, for example), and the definition of the variable and metadata as made in the model. Once the rules are set up for a specific MIP and for a specific model, the hope would be that nobody will have to tamper with that section anymore.
@chrisdane @pgierz - maybe the definition of "user" is misleading here. The aforementioned user would be the person that defines the rules based on conditions in the model and data request demands for the MIP. From my point of view the "user" would not be the individual modeller. They would in most cases use the predefined rules as they are.
Does this answer more questions than it raises?
@christian-stepanek: Correct! Well, sort of. One "user" (e.g. not someone developing the actual logic of the pymorize tool) still needs to sit down and write the mapping of CMOR to Model. That can then of course be shared. What we (the HPC team) would give is the framework for how to write down such rules. Filling them with useful values is of course up to you ;)
Re: "I guess a Rule will be responsible for one single DataRequestVariable and contain all the information needed to generate that variable. Question: does that make sense, or am I overlooking some edge case"
I think this is correct. Every variable will have one specific set of rules that define how it is to be computed, formatted, which sign conventions are applied, which metadata is to be included into NetCDF, etc.
Note, that there will be various different instances of "the same" physical model output. As @chrisdane stated above, the same variable, e.g. SAT, may be present in different CMOR tables, and different CMOR rules may apply. For example, tas (SAT) is available in both Amon and Aday tables (and in some others as well). The most relevant difference between them is that the Amon version of the variable is to be computed as monthly means, whereas the Aday version is to be computed as daily means. Therefore, the rule for Amon.tas will be different from the rule for Aday.tas, at least with regard to the definition of the time mean. Whether other things differ must be deduced from a comparison of the respective CMOR tables.
I do not yet fully understand the diagram that you provide above. If there is something more detailed to understand and to discuss then maybe it is best to do that in our CMOR meeting.
Builds the data request information from the tables, making sure there is not variable repetition and correct merging of the information spread across the different tables.
https://github.com/FESOM/seamore/blob/7725366f7b68ea3824ac6baa500ea49531722b72/lib/data_request.rb#L7
The file consists in 4 classes:
DataRequest
: responsible for managing variables and their associated metadata across multiple CMIP6 tables.DataRequestVariable
: represents a single variable and its associated metadata. It can merge multipleTableVarEntry
s intoDataRequestVariable
objectTableVarEntry
: represents a single entry for a variable in a table.DataRequestTable
: represents a single CMIP6 table and its metadata.The overall design ensures that variables are merged correctly across multiple tables, providing a unified interface to query variables and their metadata.
Crappy ~UML