Closed acaubel closed 2 years ago
@acaubel thanks for submitting, your model(s) are now registered (#1047). Please review CMIP6_source_id.html and let us know of any further tweaks required
@matthew-mizielinski just circling around on this, we recommend that source_id
is limited to 16 characters (in https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/.github/Model_registration_template.md), which isn't a hard limit, but worth considering as we move forward. To be honest I had forgotten about this.
Given the large number of closely related model versions in the IPSL portfolio (12, I think), and given the rationally designed naming scheme they came up with (which can be interpreted by humans after a little study), perhaps we should consider relaxing the "16 character rule". As I recall the reason on suggesting a 16 character limit was so that 1) file names would not get too long, and 2) these IDs could be used as short labels on graphs (and not occupy too much space).
I suppose we should check whether any of the software infrastructure assumes these names will be no longer than 16 characters before granting an exception.
Another way to identify closely-related model versions is to assign a different "physics_index" (the "p" value in the "ripf" identifier). I wonder if we should encourage movement in that direction.
@taylor13 I think [yep, just confirmed this looking at CMIP6_source_id.html - 250, 100, 50 and 25 km nominal_resolution] all these 4 IPSL-CM6A-ATM-ICO-*
configurations are identical, except for their resolution - which would mean that a grid_label would be all that is needed to differentiate them
Trouble is in CMIP6 grid_label identifies the grid on which output is reported, which is not necessarily the same as the model grid. Also if model output is reported on the native grid for two models at different resolutions, the grid label for both models in CMIP6 should be "gn", which means datasets won't be uniquely identified unless their source IDs are different.
@taylor13 let's add this to the hopper for discussion as we finalize the "harmonization" document - it makes sense to me that a single model configuration (same submodule component versions) is singly identified in its source_id
@durack1 @taylor13 We have to check the uniqueness of the dataset DRS_id. What happens if the same source_id run in two different resolutions contributes to a MIP? Should the results from two different model resolutions be cited together (ripf is a single DRS component and cannot be split)? Or will that case not happen?
There was a reason, why modeling centers started to encode the resolution in their source_id (model version)...
@taylor13 I think [yep, just confirmed this looking at CMIP6_source_id.html - 250, 100, 50 and 25 km nominal_resolution] all these 4
IPSL-CM6A-ATM-ICO-*
configurations are identical, except for their resolution - which would mean that a grid_label would be all that is needed to differentiate them
One thought here from our experience of running different resolution models; there are a (small) number of parameter and physics differences between our models at different resolutions. One notable one in the ocean is the changes in the eddy parameterisation (Gent-McWilliams is often mentioned here as being active in our eORCA1 models, but not in eORCA025), changes like this mean that the behaviour of the model can be very different depending on what you are looking at (hence HighResMIP). When users of the data come to look at the data there needs to be a very clear distinction between models at different resolutions otherwise analysis will be particularly painful -- the physics label as it stands is not a clear enough label for this purpose.
I could imagine that in the future we could break the source_id
into a model series (e.g. HadGEM3-GC31
) and sub_source_id
(e.g. LL
), but I'm not 100% convinced that this is better than what we already have. We should certainly revisit this in discussions on the harmonization document.
The only short term action I can see here is to consider whether we ask whether these recently added sources ids could be shortened (remove -ATM
? ) note that there are a couple of other models where resolution numbers like 025
have crept in.
On the topic of source_id length note that we already have an 18 character one CESM1-1-CAM5-CMIP5
and EC-Earth3-AerChem
, IPSL-CM6A-LR-INCA
, RTE-RRTMGP-181204
at 17 characters, all of which have data published on ESGF.
We had this discussion when planning CMIP5 (and again CMIP6), and now again. Each time the result was that for reasons stated above and elsewhere, models run at different resolutions are considered to be different models and are assigned different "source_ids". I agree that the short-term action is to consider whether to recommend any shortening.
As I recall, we had set out a rule at one point that a coupled model that was run in AMIP mode (so atmosphere-only) should have the same source_id as one that was coupled to an ocean. It's the experiment that determines which model components should be active. Thus, a model running the full set of DECK experiments would be consistently named (i.e., the name wouldn't be different for the AMIP run just because it was run uncoupled). With that background, "ATM" in the source_id is really not necessary (even if the model is never run coupled).
I think IPSL had already had other models with "ATM" in their source_id before adding the recent ones, so now we have to decide whether for consistency in their naming, we allow them to continue to include "ATM", or for consistency with the original source_id naming guidelines, we should ask them to remove it.
Hi all, some comments to explain some aspects :
We would like as much as possible not to modify data which has been already published. Concerning IPSL-CM6A-ATM-ICO-VHR, HR, MR and LR data : we did not yet publish it but the data has been already treated in terms of name of the model and other metadata related to source_id...and so the data is almost ready to be published.
I hope this helps to understand the IPSL naming convention. Do not hesitate to ask us if you need further information.
Thanks !
Given that data has already been written, I vote for not enforcing the 16 character "limit" and not suggesting the source names be changed.
@acaubel thanks for chiming in, and suggesting that we remove the unused entries: IPSL-CM7A-ATM-HR
and IPSL-CM7A-ATM-LR
, it makes sense to cleanup as we go, I shall reopen this issue as we have an action pending.
I wouldn't worry too much about the length, if you've already written data, leave it as it is, your naming logic makes sense. We just need to ensure that moving forward, our data conventions can capture all the nuances required, whilst ensuring that we don't have directory (or filenames) that are ridiculously long
Thanks a lot for your understanding. I confirm you can remove IPSL-CM7A-ATM-HR and IPSL-CM7A-ATM-LR models. We have planned to ask you soon to register a new model : IPSL-CM6A-ATM-LR-REPROBUS (REPROBUS is the name of our stratospheric model). I am a little bit embarrassed because the name of this new model is longer than IPSL-CM6A-ATM-ICO-VHR. We have already written data for the first experiment. We chose this name several weeks ago before you told me about the rules for the size of the name...what do you think of that ? Would "IPSL-CM6A-ATM-LR-REPROBUS" be ok for you ?
Thanks again !
Registration updates migrated to new issue #1051.
We shall leave this issue open, until the appropriate changes are made to https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/.github/Model_registration_template.md - simply updating 16 -> 25 chars
label = IPSL-CM6A-ATM-ICO-VHR label_extended = IPSL-CM6A-ATM-ICO-VHR source_id = IPSL-CM6A-ATM-ICO-VHR institution_id = IPSL release_year = 2021 activity_participation = [HighResMIP]
aerosol: description = none nominal_resolution = none atmos: description = DYNAMICO-LMDZ (NPv6; 1024000-point icosahedral-hexagonal; 79 levels; top level 80000 m) nominal_resolution = 25 km atmosChem: description = none nominal_resolution = none land: description = ORCHIDEE (v2.2, Water/Carbon/Energy mode) nominal_resolution = 25 km landIce: description = none nominal_resolution = none ocean: description = none nominal_resolution = none ocnBgchem: description = none nominal_resolution = none seaIce: description = none nominal_resolution = none