Open durack1 opened 4 months ago
@mjevanmarle bringing your email here so that we can better track finalizing the format of BB4CMIP+ files. @durack1 ping.
Please find nc icon here a test file with the following changes (compared to previous time):
standard_name = "biomassburning’+
Could you confirm that the test file is correct or changes are needed?
@mjevanmarle great to see progress!
I really don't think "+" characters are going to play well across any of the infrastructure, so I would avoid using them in filenames and other metadata where we can CMIP6+ -> CMIP6Plus would be far safer.
We need to uniquely define your dataset, being generated for use in CMIP7 (just like the previous versions were prepared for CMIP6). So let's keep that name unambiguous. The input4MIPs subproject that this is published into is less important to consider, certainly irrelevant for file naming. It is likely that we'll be receiving feedback on these data (John F/NCAR is very keen to see these new data), and so just like we had with CMIP6, there will likely be at least two versions of the CMIP7 prototype (published into the prototype CMIP6Plus project) through to final datasets.
To attempt to demystify the process, I figured attempting to tabulate how data will be identified that a) already exists (CMIP6), b) almost exists (the current working files) and c) will exist as final CMIP7 versions might be a good way to visualize this, see below
MIP era | institution_id | source_id | notes |
---|---|---|---|
CMIP6 | VUA | VUA-CMIP-BB4CMIP6-1-2 | finalized no change; plus the old datasets that are not currently available for download VUA-CMIP-BB4CMIP6-1-0, 1-1 |
CMIP6Plus | DRES | DRES-CMIP-BB4CMIP7-1-0? | working prototype published June 2024; capturing data in a pre-CMIP7 mip-era with an explicit intention of learning by doing |
CMIP7 | DRES | DRES-CMIP-BB4CMIP7-1-3? | finalized datasets published ~Jan, ~Mar 2025 historical/piControl; Guessing it might take a little iteration 1.0, 1.1, 1.2, and 1.3 or similar to get "final" data right |
I also was uncomfortable with the + sign, but don't know of any specific anticipated problems. the source_id is incorporated in file names, directory structures, unique identifying strings of datasets and the like, seems like a risk we shouldn't take.
Thanks Paul and Karl. I agree with the table above and use the following for the current release:
standard_name = "biomassburning’+ +’_flux" dataset_version_date = year-month added mip_era = "CMIP6Plus" Source_id : changed to 'DRES-CMIP-BB4CMIP7-1-0'
@durack1 Do you want me to add version_number = "1.0" as well? If you confirm, I will run all the files. 1 more question: is there a file size limit?
@mjevanmarle the above looks spot on. And yes, please add that version_number = 1.0 so we can track any updates against this version.
Is it possible to generate a single file with the updated entries and then provide me a link, I can quickly review this and then get back to you with any final nits before the whole dataset can be produced. There are a couple of examples of identifiers which other datasets have used that would be could to homogenize "dataset_version_date" is something I don't believe has been used before. Regarding "standard_name", ideally, this represents an entry in the CF standard name table (see here), which we don't have for these data, not sure whether we remove this, @taylor13?
This is great 1.0!
Yes, please remove standard_name from the files if they aren't in the CF list. You can record that information in "long_name", for now. Then, please propose an appropriate standard_name at http://cfconventions.org/discussion.html (vocabulary). It is easy, but if you have any problems, I can help.
I added dataset_version_date, because Vaishali proposed to have a date indication added to the file. Do you have a better suggestion for this descriptor?
I have removed standard_name. In the CF list I see: surface_upward_mass_flux_of_carbon_dioxide_expressed_as_carbon_due_to_emission_from_fires which comes closest, but we also provide all other species. Therefore this is all listed in the long_name. Probably there are more issues, so if you could check the files here, that would be great @durack1
@mjevanmarle just creating an issue as a placeholder for discussions in finalizing the registration of the biomass burning
institution_id
andsource_id
.Note the CMIP6 contribution had
institution_id
: VUA (here) and a couple of versioned releases, sosource_id
entries: VUA-CMIP-BB4CMIP6-1-0, 1-1 and 1-2 (here). I wonder if we want to maintain any consistency with the previous version, or just start again? I see that you've used the previous template so VUA-CMIP-BB4CMIP6-1-0 becomes DRES-CMIP-BB4CMIP7-1-0.Just a note, while we start to coordinate the collation of these prototype (v0) datasets, and gather feedback, we're aiming to catch these in the "CMIP6Plus" project, in preparation for CMIP7 in a couple of years. This will allow a clean split between the CMIP7 "endorsed" forcing collection, and those that we are working out the kinks on (caught in CMIP6Plus).
We have updated the institution registration a little moving beyond CMIP6; these now depend on the RoR registry (see here), and, as an example, Deltares is already registered - https://ror.org/01deh9c76
@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping