PCMDI / input4MIPs_CVs

Controlled Vocabularies (CVs) for use in input4MIPs
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

Register biomass burning source_id and institution_id #10

Open durack1 opened 4 months ago

durack1 commented 4 months ago

@mjevanmarle just creating an issue as a placeholder for discussions in finalizing the registration of the biomass burning institution_id and source_id.

Note the CMIP6 contribution had institution_id: VUA (here) and a couple of versioned releases, so source_id entries: VUA-CMIP-BB4CMIP6-1-0, 1-1 and 1-2 (here). I wonder if we want to maintain any consistency with the previous version, or just start again? I see that you've used the previous template so VUA-CMIP-BB4CMIP6-1-0 becomes DRES-CMIP-BB4CMIP7-1-0.

Just a note, while we start to coordinate the collation of these prototype (v0) datasets, and gather feedback, we're aiming to catch these in the "CMIP6Plus" project, in preparation for CMIP7 in a couple of years. This will allow a clean split between the CMIP7 "endorsed" forcing collection, and those that we are working out the kinks on (caught in CMIP6Plus).

We have updated the institution registration a little moving beyond CMIP6; these now depend on the RoR registry (see here), and, as an example, Deltares is already registered - https://ror.org/01deh9c76

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

vnaik60 commented 3 weeks ago

@mjevanmarle bringing your email here so that we can better track finalizing the format of BB4CMIP+ files. @durack1 ping.

Please find ​nc icon here a test file with the following changes (compared to previous time):

standard_name = "biomassburning’+ +’_flux" dataset_version_number = "1.0" removed #note the attribute source_version still exists dataset_version_date = year-month added mip_era = "CMIP7" -> kept CMIP7 @Durack, Paul J. can you let me know whether this should be CMIP7 or CMIP6+? Source_id : changed to 'DRES-CMIP-BB4CMIP6+-1-0' #we thought the dataset should be called BB4CMIP6+ (that sounds like a better name, because we mostly extended). This also is reflected by the name of the document which is in the example: H2-em-biomassburning_input4MIPs_emissions_CMIP_DRES-CMIP-BB4CMIP6+-1-0_gn_190001-202212.nc molecular weight added as variable[species].molecular_weight: 'molecular weight of ' ': #based on excel shared b y guido :data_usage_tips ->string is removed

Could you confirm that the test file is correct or changes are needed?

durack1 commented 3 weeks ago

@mjevanmarle great to see progress!

I really don't think "+" characters are going to play well across any of the infrastructure, so I would avoid using them in filenames and other metadata where we can CMIP6+ -> CMIP6Plus would be far safer.

We need to uniquely define your dataset, being generated for use in CMIP7 (just like the previous versions were prepared for CMIP6). So let's keep that name unambiguous. The input4MIPs subproject that this is published into is less important to consider, certainly irrelevant for file naming. It is likely that we'll be receiving feedback on these data (John F/NCAR is very keen to see these new data), and so just like we had with CMIP6, there will likely be at least two versions of the CMIP7 prototype (published into the prototype CMIP6Plus project) through to final datasets.

To attempt to demystify the process, I figured attempting to tabulate how data will be identified that a) already exists (CMIP6), b) almost exists (the current working files) and c) will exist as final CMIP7 versions might be a good way to visualize this, see below

MIP era institution_id source_id notes
CMIP6 VUA VUA-CMIP-BB4CMIP6-1-2 finalized no change; plus the old datasets that are not currently available for download VUA-CMIP-BB4CMIP6-1-0, 1-1
CMIP6Plus DRES DRES-CMIP-BB4CMIP7-1-0? working prototype published June 2024; capturing data in a pre-CMIP7 mip-era with an explicit intention of learning by doing
CMIP7 DRES DRES-CMIP-BB4CMIP7-1-3? finalized datasets published ~Jan, ~Mar 2025 historical/piControl; Guessing it might take a little iteration 1.0, 1.1, 1.2, and 1.3 or similar to get "final" data right
taylor13 commented 3 weeks ago

I also was uncomfortable with the + sign, but don't know of any specific anticipated problems. the source_id is incorporated in file names, directory structures, unique identifying strings of datasets and the like, seems like a risk we shouldn't take.

mjevanmarle commented 2 weeks ago

Thanks Paul and Karl. I agree with the table above and use the following for the current release:

standard_name = "biomassburning’+ +’_flux" dataset_version_date = year-month added mip_era = "CMIP6Plus" Source_id : changed to 'DRES-CMIP-BB4CMIP7-1-0'

@durack1 Do you want me to add version_number = "1.0" as well? If you confirm, I will run all the files. 1 more question: is there a file size limit?

durack1 commented 1 week ago

@mjevanmarle the above looks spot on. And yes, please add that version_number = 1.0 so we can track any updates against this version.

Is it possible to generate a single file with the updated entries and then provide me a link, I can quickly review this and then get back to you with any final nits before the whole dataset can be produced. There are a couple of examples of identifiers which other datasets have used that would be could to homogenize "dataset_version_date" is something I don't believe has been used before. Regarding "standard_name", ideally, this represents an entry in the CF standard name table (see here), which we don't have for these data, not sure whether we remove this, @taylor13?

This is great 1.0!

taylor13 commented 1 week ago

Yes, please remove standard_name from the files if they aren't in the CF list. You can record that information in "long_name", for now. Then, please propose an appropriate standard_name at http://cfconventions.org/discussion.html (vocabulary). It is easy, but if you have any problems, I can help.

mjevanmarle commented 1 week ago

I added dataset_version_date, because Vaishali proposed to have a date indication added to the file. Do you have a better suggestion for this descriptor?

I have removed standard_name. In the CF list I see: surface_upward_mass_flux_of_carbon_dioxide_expressed_as_carbon_due_to_emission_from_fires which comes closest, but we also provide all other species. Therefore this is all listed in the long_name. Probably there are more issues, so if you could check the files here, that would be great @durack1