Open gnikulin opened 11 months ago
I think it makes sense to rename it. CORDEX-CMIP6 is cumbersome, especially when said aloud. How about CORDEX6, in alignment with AR6 & CMIP6?
actually, we discussed this in #5 . I think, there is some more or less convention to have the activity_id
be the filename prefix of the tables (e.g., compare to obs4MIPs
, input4MIPs
)...
I think it makes sense to rename it. CORDEX-CMIP6 is cumbersome, especially when said aloud. How about CORDEX6, in alignment with AR6 & CMIP6?
It has been decided to use "CORDEX-CMIP6" for this activity. Indeed, it's a bit cumbersome but provides good and clear description. From the CORDEX experiment design for dynamical downscaling of CMIP6 (https://cordex.org/wp-content/uploads/2021/05/CORDEX-CMIP6_exp_design_RCM.pdf):
In addition to the continental-scale downscaling, addressed in this document, CORDEX includes many other components. For example, the Flagship Pilot Studies (FPS) and regional workshops for climate and VIA communities. CORDEX is a continuous activity that is not divided into phases (1st, 2nd, etc.) and not necessarily related to the CMIP cycles. The framework described in this document is simply referred to as CORDEX-CMIP6.
actually, we discussed this in #5 . I think, there is some more or less convention to have the
activity_id
be the filename prefix of the tables (e.g., compare toobs4MIPs
,input4MIPs
)...
I'm not sure that there are build rules for the file name of the tables. Will the input4MIPs tables have the same names (without mip_era) as now for CMIP7? Other activities don't have their own table at all, e.g. ScenarioMIP etc.
CORDEX is not a CMIP6 project or activity that contributes to CMIP6 and here we have more freedom to define what's better for CORDEX. Regarding activity_id
, it was suggested that in CORDEX-CMIP6 activity_id
is "an identifier of different CORDEX activities as dynamical downscaling, empirical- statistical downscaling, Flagship Pilot Studies and bias adjustment (e.g. “RCM”, “ESD”, “FPS”, “Adjust”)".
Currently we have CORDEX_source_id.json assuming only RCMs as a source. However, when we are going to register ESD methods we need to distinguish this ESD source table from the RCM one, another level of complexity :-). Perhaps we may even need to add the CORDEX-CMIP6 activity_id
to some CV file names, something like CORDEX-CMIP6_RCM_source_id.json, CORDEX-CMIP6_ESD_source_id.json etc. ?
@gnikulin - That makes sense. CORDEX-CMIP6 it is, then.
With regard to activity_id
, I think we need to allow for additional cases. For example, one project I'm involved with aims to include some variable-resolution simulations in the mix for comparison with RCM downscaling. There are also efforts to train Machine Learning models to emulate RCMs. Those both will require expanding the CORDEX_source_id.json file, and in the case of ML methods, I think you have two sources: both the ML setup and the RCM it was trained to emulate. (Or possibly even multiple RCMs, if that proves feasible.)
I would include both limited area RCMs and VR-GCMs in the same "RCM" source_id file. There is the global attribute source_type
which provides a short description of model configuration (e.g. “RCM”, “AGCM”, “RESM”, "VR-GCM", etc., all acronyms should be defined). This information can also be requested during the registration.
Regarding ML methods, I consider them as some kind of ESD and suggest to include them to the "ESD" source_id file. Information about datasets (e.g RCMs) used for training ML methods should be reflected in metadata (global attributes), can be different for the same ML method (e.g. https://cordex.org/wp-content/uploads/2017/06/CORDEX_ESD_Experiment1.pdf) Here, it is necessary to get input from the ESD community.
Creating many CVs for specific cases may make the CORDEX data infrastructure too complex. I would vote for the simplest solution.
Promoting specific values (RCM, ESD, ...) of the controlled vocabulary to the filenames seems to break the general build rules for these files. I see no problem in merging all source_id's under a single file, given that we add the source_type
to each source. Also, having different model_component
s or required_global_attributes
depending of the source_type
should not be a problem. It would be a matter of having a new CORDEX[-CMIP6?]_model_component.json
listing the expected components (or global attributes, for the existing CORDEX_required_global_attributes.json
) for each source_type
. These two files (CORDEX_model_component.json
(maybe in plural, components?) and CORDEX_required_global_attributes.json
) would kind of define the source_type
. Each new source_type
created should add its defining components and attributes to those files.
(this thread has gone a bit off-topic from the original post)
Yes, i agree that is sufficient to use source_type
to distinguish different types of downscaling methods (dynamic, statistical, ml) so that all types of downscaling sources can go into one source_id
table.
For the required attributes to register a source_id
(https://github.com/WCRP-CORDEX/cordex-cv/issues/4) i wouldn't make the model components a requirement but only the most basic ones, e.g., source_id
, source
, release_year
, institution_id
.
I am unsure about the required_global_attributes
. Having different required_global_attributes
depending on activity_id
would require like a new CV
table for each activity (in the end, everything ends up in one CV table). So, in the past, the distinction was made through product
and project_id
aka activity_id
. I could imagine, e.g., having ML and bias adjustment models producing bias-adjusted
or ml-adjusted
output if they are based on output of RCM (dynamic) models. bias adjustment also had different variable names, e.g., tasAdjust
instead of tas
. There can still be additional attributes of course, e.g., like in ESD there was bias_adjustment
but i would not make them required global attributes.
Second option would be to have another set of tables for bias adjustment and bias adjust based on the common CV in this repo. For example, for bias adjustment there would another repo of tables with the same filenames (CORDEX-CMIP6_CV.json
, CORDEX-CMIP6_mon.json
, etc...) but tailored for bias adjustment and if necessary adjusted output variable names. I wonder how that was done in the past since, at least, the cordex cmip5 cmor tables contain no hint on adjusted output. I guess, it was done by adjusting those tables?
I would say different activities (see also #20) would need to define their own CV and tables, based on these general ones. It should be a matter of degenerating the CORDEX_activity_id.json to a single value, reducing other elements (e.g. CORDEX_domain_id.json) and generally adapting the rest of the elements (including the required_global_attributes
) and tables to the protocols of the particular initiatives. The question is also if here we want to encompass all activities or just focus on providing the example for the dynamical downscaling on continental domains (activity_id = RCM). Even for the continental-scale domains, different domains are managing different output variable lists; mainly based on the general CORDEX one, but removing (no problem) or adding some variables. So, even for the same activity_id a separate set of tables might be needed.
Yes, i agree with @jesusff, other activities might have different requirements for their vocabulary that we don't even know about yet. And from my experience, users will mess with the tables anyway. The important thing is to have a vocabulary that can be used for QA for ESGF publication, althoug we don't even have a checker yet :persevere: (PrePARE does only work for CMIP6)
Thanks for opening #20 !
If we are back to the original post :-). Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX ? My concern is that when we come to CORDEX-CMIP7 it's a bad practice to use the same file names for files with different content.
Promoting specific values (RCM, ESD, ...) of the controlled vocabulary to the filenames seems to break the general build rules for these files. I see no problem in merging all source_id's under a single file, given that we add the
source_type
to each source. Also, having differentmodel_component
s orrequired_global_attributes
depending of thesource_type
should not be a problem. It would be a matter of having a newCORDEX[-CMIP6?]_model_component.json
listing the expected components (or global attributes, for the existingCORDEX_required_global_attributes.json
) for eachsource_type
. These two files (CORDEX_model_component.json
(maybe in plural, components?) andCORDEX_required_global_attributes.json
) would kind of define thesource_type
. Each newsource_type
created should add its defining components and attributes to those files.(this thread has gone a bit off-topic from the original post)
OK, we can try to merge all source_id's under a single file and distinguish them by source_type
.
For the required attributes to register a
source_id
(#4) i wouldn't make the model components a requirement but only the most basic ones, e.g.,source_id
,source
,release_year
,institution_id
.
Does the source_id
identify the model / method used to perform the downscaling? If so, I'm not sure that release_year
and institution_id
are well-defined for methods that aren't RCMs. For example, what would they be for the (simplistic but still widely-used) ESD method of interpolation + bias-correction?
Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX?
CORDEX-CMIP6 makes sense for exactly the reason you give, that we don't want ambiguity if/when we do this again later on.
If we are back to the original post :-). Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX ? My concern is that when we come to CORDEX-CMIP7 it's a bad practice to use the same file names for files with different content.
OK, agreed, i'll rename them!
Second option would be to have another set of tables for bias adjustment and bias adjust based on the common CV in this repo. For example, for bias adjustment there would another repo of tables with the same filenames (
CORDEX-CMIP6_CV.json
,CORDEX-CMIP6_mon.json
, etc...) but tailored for bias adjustment and if necessary adjusted output variable names. I wonder how that was done in the past since, at least, the cordex cmip5 cmor tables contain no hint on adjusted output. I guess, it was done by adjusting those tables?
Regarding bias-adjusted variables, all modifications of their acronyms and long names are very simple and described in the DRS for bias-adjusted CORDEX simulations http://is-enes-data.github.io/CORDEX_adjust_drs.pdf
by appending Adjust
to the variable name DRS elements in file names and in NetCDF files: pr -> prAdjust, tas -> tasAdjust
long names (the long_name NetCDF attribute) have to be also modified by adding Bias-Adjusted
in front of the long names Near-Surface Air Temperature –> Bias-Adjusted Near-Surface Air Temperature
.
There were no specific CORDEX-CMIP5 CMOR tables for bias-adjustment.
I would say different activities (see also #20) would need to define their own CV and tables, based on these general ones. It should be a matter of degenerating the CORDEX_activity_id.json to a single value, reducing other elements (e.g. CORDEX_domain_id.json) and generally adapting the rest of the elements (including the
required_global_attributes
) and tables to the protocols of the particular initiatives. The question is also if here we want to encompass all activities or just focus on providing the example for the dynamical downscaling on continental domains (activity_id = RCM). Even for the continental-scale domains, different domains are managing different output variable lists; mainly based on the general CORDEX one, but removing (no problem) or adding some variables. So, even for the same activity_id a separate set of tables might be needed.
Actually, there is no need to create new CMOR tables for different domains if output variable lists are different. All variables should be include in the CORDEX-CMIP6 CMOR tables and each domain post-processes only a subset of them.
I agree, so maybe we can setup also a simple registration process for the data-request table (just give variable, frequency and some basic details) from which the tables are updated. It would be much nicer since converting from google spreadsheets is a pain.
Yes, it's a good idea. The atmospheric variable spreadsheet was the first human-readable step to discuss what variables should be archived. Adding new variables indeed can be done more efficiently with a registration process for the data request table. There is a ocean variable list (https://doi.org/10.5281/zenodo.8207553), again a spreadsheet :-), that should be included.
OK, I can add the ocean variables, still have the script that can tackle the spreadsheets, seems to be similar format...
The format should be the same as the atmospheric variable spreadsheet was used as a template. The ocean list published in zenodo is pdf but there is a goggle spreadsheet as well.
There is also lists with aerosol variables (https://doi.org/10.5281/zenodo.7112860) and river ones (https://doi.org/10.5281/zenodo.7112673) should be checked to avoid duplication.
Should we rename cordex to cordex-cmip6, both for the repository name and for all CVs ? Many CVs are defined only for CORDEX-CMIP6 and will be different in CORDEX-CMIP7.