Add source IDs for use in various CMIP7 phases

znichollscr commented 3 weeks ago

Description

Checklist

Please confirm that this pull request has done the following:

[x] Documentation added (where applicable)
[x] Dev docs
[x] Changelog item added to changelog/
[x] rename write-revision-history to something more accurate

github-actions[bot] commented 3 weeks ago

No changes to the database between 'main' branch and 81cc0f1591020e1f9b388a680510e130ea01582f

github-actions[bot] commented 3 weeks ago

No changes to the database between 'main' branch and f33f801f62781bf4f3b21ddbce7a55db5ca1465d

znichollscr commented 2 weeks ago

Yep all good thsi can sit until then

znichollscr commented 2 weeks ago

@vnaik60 would it be possible to get your thoughts on this? The key questions are:

Do you find (or do you think other modellers will find) this section helpful? https://input4mips-controlled-vocabularies-cvs--151.org.readthedocs.build/en/151/dataset-overviews/#which-data-sets-should-i-be-using-for-cmip7
Does having a summary on each individual forcing page, explaining which source ID to use for which phase of CMIP7 seem helpful to you (example for GHGs here https://input4mips-controlled-vocabularies-cvs--151.org.readthedocs.build/en/151/dataset-overviews/greenhouse-gas-concentrations/#source-ids-for-cmip7-phases)

github-actions[bot] commented 2 weeks ago

No changes to the database between 'main' branch and ae695241dbf7146b3a186aab66d0f792062ed77e

vnaik60 commented 2 weeks ago

Thanks for creating these Zeb! #2 is definitely very helpful as it provides all the information regarding the dataset in one place. Would it possible to include #1 within 2 itself to avoid going to multiple places for information? For example, in the snippet below, you could include under the Testing heading -- use this for testing only or maybe change the Testing heading to CMIP6Plus and then state that this is only for testing.

Source IDs for CMIP7 phases# The source ID that identifies the dataset to use in the different phases of CMIP7 is given below.

Testing: CR-CMIP-0-3-0

AR7 fast track: No data available for this phase yet

CMIP7: No data available for this phase yet

znichollscr commented 2 weeks ago

Would it possible to include #1 within 2 itself to avoid going to multiple places for information?

Yep, good suggestion. Will tweak then merge

github-actions[bot] commented 2 weeks ago

No changes to the database between 'main' branch and 4fde037b1e967e55ec412c586997c0affddd2257

vnaik60 commented 2 weeks ago

I wonder if we can condense this information

For the testing of CMIP7, use data with the source ID CR-CMIP-0-3-0

This data is for testing purposes only. Production simulations should not be started based on this data. (As a further bit of context, you can tell that this is testing data because it has a mip_era metadata value of 'CMIP6Plus'. This metadata value appears both in the file's global metadata as well as its metadata on ESGF.)

to

To inform the finalization of CMIP7 forcing datasets, please use mip_era CMIP6Plus dataset for testing and evaluation with the source_ID CR-CMIP-0-3-0

Just trying to be succinct.

znichollscr commented 2 weeks ago

Always good to be succint. I am a little bit hesitant to rely on the mip_era concept alone. The main reason is that I'm not sure how obvious it is to people what MIP era is. It wasn't obvious to me when I started with this stuff, and the MIP era doesn't appear in the filename so you have to know where to look in order to find it. In contrast, the source ID is in the filename and the text written is (overly) explicit, which I think is perhaps worth it at this stage given the possibility for miscommunication.

Having said that, you're a modeller and know how modellers think much better than I do so happy to follow your lead if you still think we should condense as suggested.

durack1 commented 2 weeks ago

Thanks for this again @znichollscr, it's helpful to have the template padded out so info can be slotted in that looks similar across datasets.

A nit, the "AR7 Fast Track" is actually part of CMIP7, just the prioritized first bit. Just peeking at https://wcrp-cmip.org/cmip7/, this is defined as the "CMIP AR7 Fast Track", I have always been referring to this as the CMIP7 AR7 Fast Track, as this will be using (for the most part) CMIP7-era models, forcing, (potentially new) experimental protocols etc.. The files will also be labelled mip_era = "CMIP7", so it would be good for us to be as clear as possible.. Again trying to aid comms, and making sure that we are all using the same nomenclature

vnaik60 commented 2 weeks ago

@znichollscr - I just went and looked at the filenames and I dont understand why we have CMIP appearing twice in the filennames (e.g., ch4_input4MIPsGHGConcentrationsCMIP_CR-CMIP-0-3-0_gm_0001-2022.nc, CO-em-anthro_input4MIPs_emissions_CMIP_CEDS-CMIP-2024-10-21_gn_200001-202212.nc). This was not the case for CMIP6 filenames (e.g., CO-em-anthro_input4MIPs_emissions_CMIP_CEDS-2017-05-18_gn_200001-201412.nc). Do we need the dual CMIP?

mip_era shows up quite clearly when I ncdump the files. I imagine that most modelers check the metadata in the files before beginning any work so it would be ok to use it in the text.

@durack1 we are getting mixed messages on "CMIP AR7 Fast Track" versus "CMIP7 AR7 Fast Track". It would be great to get clarification and agreement on this. Thanks!

znichollscr commented 2 weeks ago

A nit, the "AR7 Fast Track" is actually part of CMIP7...

At the risk of having two threads going at once (one here, one in emails), I'm with Vaishali on this in that I've now heard both "call it AR7 fast track" and "call it CMIP7 AR7 fast track". Tweaking is easy, but would be good to understand what the advice is, also including @eleanororourke's view.

I just went and looked at the filenames and I dont understand why we have CMIP appearing twice in the filennames

A good question. One occurence is because the target_mip for these experiments is CMIP (full list of supported target MIPs: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_target_mip.json). The other is because, for whatever reason, CEDS choose to include "CMIP" in their source ID (full list of source IDs here: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_source_id.json). We could just ask people not to include "CMIP" in their source ID. @durack1 I don't know if there was some logic for having people include "CMIP" in their source ID (we also have it in ours for GHGs, but I have no idea why, I think I'm just copying what was done for CMIP6).

If it's of interest, the DRS (https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_DRS.json) shows how the file path and names are constructed. If you squint a bit, it should make sense.

mip_era shows up quite clearly when I ncdump the files. I imagine that most modelers check the metadata in the files before beginning any work so it would be ok to use it in the text.

Huh, I hadn't assumed that was a default step. Good to know. I'll make a new PR with some tweaks now.

znichollscr commented 2 weeks ago

Tweaks in #157 if you want to directly comment there

vnaik60 commented 1 week ago

@znichollscr elevating this

A good question. One occurence is because the target_mip for these experiments is CMIP (full list of supported target MIPs: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_target_mip.json). The other is because, for whatever reason, CEDS choose to include "CMIP" in their source ID (full list of source IDs here: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_source_id.json). We could just ask people not to include "CMIP" in their source ID. @durack1 I don't know if there was some logic for having people include "CMIP" in their source ID (we also have it in ours for GHGs, but I have no idea why, I think I'm just copying what was done for CMIP6).

Since we are updating the solar dataset, it would be a good idea to revisit the naming of the files as all CMIP6Plus dataset filenames, except land use and the JRA data, suffer from dual occurences of "CMIP"

@durack1 - I agree with Zeb's suggestion that we should remove CMIP from source ID which will then get rid of the second occurrence of CMIP in the name. In https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_source_id.json, land-use and the JRA datasets do not contain CMIP in their source_ID so removing it from the names of others will make it all consistent.

PCMDI / input4MIPs_CVs

Add source IDs for use in various CMIP7 phases #151

Description

Checklist