PCMDI / input4MIPs_CVs

Controlled Vocabularies (CVs) for use in input4MIPs
https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/
Creative Commons Attribution 4.0 International
5 stars 1 forks source link

Register volcanic forcing source_id and institution_id #9

Closed durack1 closed 2 months ago

durack1 commented 8 months ago

@thomasaubry just creating an issue as a placeholder for discussions in finalizing the registration of the volcanic forcing institution_id and source_id.

Note the CMIP6 contribution had institution_id: IACETH (here) and a couple of versioned releases, so source_id entries: IACETH-SAGE3lambda-2-1-0 and 3-0-0 (here).

We have updated the institution registration a little moving beyond CMIP6, these now depend on the RoR registry (see here), and, as an example UExeter is already registered - https://ror.org/03yghzc09

@wolfiex @matthew-mizielinski @taylor13 @vnaik60 @znichollscr ping

znichollscr commented 8 months ago

Great thanks. Just to check, all institutes need to be registered on RoR now in order to play nice with the CMIP process?

durack1 commented 8 months ago

@znichollscr that's what we are aiming for, this simplifies things a little our end, as RoR has intentions to manage a lot of info

znichollscr commented 8 months ago

Ok great, but there'll still be some institution ID key in https://github.com/PCMDI/input4MIPs-cmor-tables/blob/master/input4MIPs_institution_id.json#L5

Or is the plan to just check against RoR, and if it's not there, explode?

durack1 commented 8 months ago

@znichollscr yep exactly. So as an example, your CMIP6-era contribution was from UoM (University of Melbourne), which is identified here. Ditto for all the other CMIP6-era contributions (CMIP6, input4MIPs, ...)

thomasaubry commented 7 months ago

Thanks both, only just getting to this! I can't see an id for exeter on the RoR page. Do I need to directly edit the input4MIPs_CVs file? Is Exeter the right institution anyway, or could I make a more inclusive one like "CMIP7 CFTT Volcano emission team" and list multiple institutions ? Sources will be quite varied so it might be worth a discussion on how to best inform that. Apologies in advance for the trivial questions and for not engaging with this earlier.

durack1 commented 7 months ago

Hi @thomasaubry, looks like U. Exeter is at https://ror.org/03yghzc09.

The way that we've worked in the past (with CMIP6) was to have an institution_id, and a source_id. The institution in this case makes sense to me to be UExeter (linked to the ROR that exists linked above), and then we need to determine a source_id which identifies the contributor(s). In CMIP6 a modelling group (e.g. NOAA-GFDL) contributed data from multiple models/source_id, e.g. GFDL-CM4, GFDL-ESM4 etc - for examples see CMIP6_source_id.html. Ideally, we want these short (<25 chars) as they will be used in the directories and filenames that the data will be published to ESGF on - for the CMIP6 template see here

durack1 commented 7 months ago

@znichollscr we have a prototype netcdf file available, so let me know where I can drop this so it can be checked - or how to run the checker on the data?

znichollscr commented 7 months ago

@znichollscr we have a prototype netcdf file available, so let me know where I can drop this so it can be checked - or how to run the checker on the data?

Good question, not really ready for that yet unfortunately! Will ping when I am

durack1 commented 7 months ago

@znichollscr I was hoping we could highlight this tool during the meeting Wednesday.. Any chance this could happen, or are we still a while away?

Folks are starting to get to a point that demo files are available, so figured trying to steer everything in one direction (eval tool) is a better idea than my old scripts

znichollscr commented 7 months ago

We can talk about it and the idea, but actually using it is still a couple of months away unfortunately because I have to get my own data out first 🙃

durack1 commented 4 months ago

@thomasaubry just looping back around on this source_id registration - will need to ascertain what we want to call this data and which institution is the host etc before we proceed - maybe a quick telco could be quickest?

vnaik60 commented 4 months ago

Edited comment after iteration with @durack1 @thomasaubry great start to get all the data converted into netcdf! I just had a look at the file you shared with us. Some very quick feedback:

for strat_opt_aer* file

for utsvolcsulfur_* file - this is going to be a complicated file

ping @durack1

znichollscr commented 4 months ago

Hi @thomasaubry, congrats from me too.

As @vnaik60 says, there are quite a few things to tweak. It might be easiest if I give you a hand getting started. I'll drop you an email to find a time.

As a quick question, is there a CMIP6 data file we should be using as a template/example? Or are we doing a fresh start this time?

znichollscr commented 4 months ago

Perhaps this file is our starting point/example? https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/PAMIP/IPSL/IPSL-CM6A-LR/pdSST-pdSIC/r164i1p1f1/Eday/aod550volso4/gr/v20191124/aod550volso4_Eday_IPSL-CM6A-LR_pdSST-pdSIC_r164i1p1f1_gr_20000401-20010531.nc

znichollscr commented 4 months ago

Perhaps this file is our starting point/example? https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/PAMIP/IPSL/IPSL-CM6A-LR/pdSST-pdSIC/r164i1p1f1/Eday/aod550volso4/gr/v20191124/aod550volso4_Eday_IPSL-CM6A-LR_pdSST-pdSIC_r164i1p1f1_gr_20000401-20010531.nc

Although, seems unlikely given that is a model output file :) (and the link is now dead)

vnaik60 commented 4 months ago

@znichollscr that seems like an output file (the link did not work for me though) Here is a description of what was available for CMIP6. For optical properties Beiping provided data on each model's spectral band. The data was made available at ftp://iacftp.ethz.ch/pub_read/luo/CMIP6/ which does not work any more. I am not sure if that data ever made it to ESGF (@durack1 ?). If you would like to see what was provided for GFDL model, I can request my colleague who processed the data to dig up and share.

znichollscr commented 4 months ago

Oops yes thank you.

If you would like to see what was provided for GFDL model, I can request my colleague who processed the data to dig up and share

I would flip this around. If you want us to provide a file that looks the same as was provided last time, send us a file and we can see how hard it would be to replicate :) (I know that consistency with CMIP6 is important, even if I still think it pushes us in the wrong direction)

vnaik60 commented 4 months ago

Good point :-). I will share it here once we find it.

durack1 commented 4 months ago

The files you want are the IACETH-SAGE3lambda-3-0-0 files on ESGF/input4MIPs. Having said that, Beiping generated model wavelength-targeted files for each of the modelling groups, so effectively did the mapping/interpolation of the native data to match the atmospheric vertical coords - probably a good idea to schedule a telco time so we can quickly calibrate and outline next steps

znichollscr commented 4 months ago

Ah ok cool thanks. Those files only have aerosol information in them, nothing about injection heights etc. Is the injection height stuff new for CMIP6, or is there an example file from CMIP6 for that too?

vnaik60 commented 4 months ago

The injection heights etc is new as the volcanic SO2 emissions is a new dataset in CMIP7. Although a dataset developed by Neely and Schmidt existed which was used at least by WACCM for CMIP6.

znichollscr commented 4 months ago

Nice, thank you. Was the data format they used helpful? Or is better to just put this data on a time-lat-lon-height grid, even if it is mostly nan?

vnaik60 commented 4 months ago

It was helpful to the extent that we converted it to what could be used in our respective models (netcdf, regular time-alt-lat-lon grid and any other model related format change). We iterated with Tom earlier this year and agreed that it would be ok to include lat/lon/time only for which there is an eruption to keep the size of the file under control.

By the way, just to confirm that we are discussion two files here - 1) aerosol optical properties that were also made available in CMIP6 (@durack1 shared this in https://github.com/PCMDI/input4MIPs_CVs/issues/9#issuecomment-2226154177) and 2) volcanic SO2 emissions that were not made available in CMIP6 but were available externally (https://github.com/PCMDI/input4MIPs_CVs/issues/9#issuecomment-2226186387).

znichollscr commented 4 months ago

Thanks @vnaik60 for sending the previous format via email. I've put ncdumps of IACETH-SAGE3 and the file that was sent below. I have to admit that it is not clear to me at all how the two map, but I assume it will make more sense to @thomasaubry once we have a look tomorrow.

ncdump of multiple_input4MIPs_aerosolProperties_CMIP_IACETH-SAGE3lambda-3-0-0_gn_1850_2014.nc ``` netcdf multiple_input4MIPs_aerosolProperties_CMIP_IACETH-SAGE3lambda-3-0-0_gn_1850_2014 { dimensions: time = 1 ; latitude = 36 ; altitude = 70 ; variables: float time(time) ; time:units = "days since 1850-01-01 0:0:0" ; time:axis = "X" ; time:long_name = "time" ; time:standard_name = "time" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:standard_name = "latitude" ; latitude:axis = "Y" ; latitude:long_name = "latitude" ; float altitude(altitude) ; altitude:standard_name = "altitude" ; altitude:axis = "Z" ; altitude:long_name = "altitude" ; altitude:positive = "up" ; altitude:units = "km" ; float pressure(time, latitude, altitude) ; pressure:units = "hPa" ; pressure:standard_name = "pressure" ; pressure:long_name = "pressure" ; pressure:positive = "down" ; float sad(time, latitude, altitude) ; sad:units = "um2/cm3" ; sad:standard_name = "surface_area_density_of_ambient_aerosol" ; sad:long_name = "surface_area_density_of_ambient_aerosol" ; float rmean(time, latitude, altitude) ; rmean:units = "microns, SAD weighted" ; rmean:standard_name = "mean_radius_of_aerosol" ; rmean:long_name = "mean_radius_of_aerosol" ; float volume_density(time, latitude, altitude) ; volume_density:units = "um3/cm3" ; volume_density:standard_name = "volume_density_of_ambient_aerosol" ; volume_density:long_name = "volume_density_of_ambient_aerosol" ; float H2SO4_mass(time, latitude, altitude) ; H2SO4_mass:units = "molecules/cm3air" ; H2SO4_mass:standard_name = "mass_concentration_of_H2SO4_of_ambient_aerosol" ; H2SO4_mass:long_name = "mass_concentration_of_H2SO4_of_ambient_aerosol" ; float sad_of_big_particles(time, latitude, altitude) ; sad_of_big_particles:units = "um2/cm3" ; sad_of_big_particles:long_name = "surface_area_density_of_big_ambient_aerosol" ; sad_of_big_particles:standard_name = "surface_area_density_of_big_ambient_aerosol" ; // global attributes: :title = "Surface area density of stratospheric aerosol climatology, yearly averaged 1850 through 2014" ; :institution_id = "IACETH" ; :institution = "Institute for Atmosphere and Climate, ETH Zurich, Zurich 8092, Switzerland" ; :activity_id = "input4MIPs" ; :Conventions = "CF-1.6" ; :dataset_category = "aerosolProperties" ; :dataset_version_number = "3.0.0" ; :grid_label = "gn" ; :mip_era = "CMIP6" ; :realm = "atmos" ; :target_mip = "CMIP" ; :variable_id = "multiple" ; :data_structure = "grid" ; :frequency = "yrClim" ; :creation_date = "2017-10-04T23:45:00Z" ; :source = "SAGE, SAM, CALIPSO, OSIRIS, 2D-model-simulation and Photometer" ; :source_id = "IACETH-SAGE3lambda-3-0-0" ; :further_info_url = "ftp://iacftp.ethz.ch/pub_read/luo/CMIP6/data_description.txt and release_note_v3-0.txt" ; :contact = "Beiping Luo: beiping.luo@env.ethz.ch or Larry Thomason: l.w.thomason@nasa.gov" ; :comment = "This is the climatology averaged over 1850-2014. We assume that the aerosols are composed of sulfuric acid. The polar stratospheric clouds are excluded. The data is only val above the tropopause. sad_big refers to the SAGE-3l data. This data deprecates previious date versions 2.0 and 2.1.0" ; :tracking_id = "hdl:21.14100/699285d7-9728-4865-ab7e-ac9001df3f85" ; :nominal_resolution = "500 km" ; } ```
ncdump of CMIP_GFDL_radiation_v2.nc ``` netcdf CMIP_GFDL_radiation_v2 { dimensions: altitude = 70 ; latitude = 36 ; month = 1980 ; solar_bands = 18 ; terrestrial_bands = 12 ; variables: float altitude(altitude) ; altitude:units = "km" ; float latitude(latitude) ; latitude:units = "degrees_north" ; float month(month) ; month:units = "month starting from 1850 01" ; float wl1_sun(solar_bands) ; wl1_sun:units = "lower boundary of wavelength of solar band in um" ; float wl2_sun(solar_bands) ; wl2_sun:units = "higher boundary of wavelength of solar band in um" ; float wl1_earth(terrestrial_bands) ; wl1_earth:units = "lower boundary of wavelength of terrestrial band" ; float wl2_earth(terrestrial_bands) ; wl2_earth:units = "higher boundary of wavelength of terrestrial band" ; float ext_sun(solar_bands, latitude, altitude, month) ; ext_sun:units = "extinction coefficient of solar bands in 1/km" ; float omega_sun(solar_bands, latitude, altitude, month) ; omega_sun:units = "single scattering albedo of solar bands" ; float g_sun(solar_bands, latitude, altitude, month) ; g_sun:units = "asymmetry factor of solar bands" ; float ext_earth(terrestrial_bands, latitude, altitude, month) ; ext_earth:units = "extinction coefficient of terrestrial bands in 1/km" ; float omega_earth(terrestrial_bands, latitude, altitude, month) ; omega_earth:units = "single scattering albedo of terrestrial bands" ; float g_earth(terrestrial_bands, latitude, altitude, month) ; g_earth:units = "asymmetry factor of terrestrial bands" ; // global attributes: :title = "Surface aerea density of stratopheric aerosol,monthly mean value" ; :Institution_id = "IACETH" ; :Institution = "Institute for Atmosphere and Climate, ETH Zurich, Switzerland" ; :activity_id = "input4CMIPs" ; :Conventions = "CF-1.6" ; :data_structure = "grid" ; :frequency = "month" ; :creation_date = "20160531" ; :source = "SAGE, SAM, CALIPSO, OSIRIS, 2D-model-simulation and Photometer" ; :source_id = "SAGE CMIP" ; :further_info_url = "ftp://iacftp.ethz.ch/pub_read/luo/CMIP6/data_description.txt" ; :contact = "Beiping Luo: beiping.luo@env.ethz.ch or Larry Thomason: l.w.thomason@nasa.gov" ; :comment = "We take only the sulfuric acid aerosol into account. The PSCs are excluded. The data is only valid above the tropopause. The values with fill_flag=-1 were extrapolated from value at higher altutide and should not be used! " ; } ```
thomasaubry commented 4 months ago

Hi everyone, just dropping a line to apologize for the silence! I had a grant deadline last Friday and was off yesterday/this morning so re-emerging...I will reply/add comments/etc today/over the week (although slowly because I took a bit of time off...repainting my flat!)

durack1 commented 3 months ago

Just cross-tagging repos - awaiting action at https://github.com/PCMDI/mip-cmor-tables/issues/60

znichollscr commented 3 months ago

Suggested source ID entry (same idea as #42)

    "UoE-CMIP-0-1-0":{
        "contact":"T.Aubry@exeter.ac.uk",
        "further_info_url":"www.tbd.invalid",
        "institution_id":"UoE",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP6Plus",
        "source_version":"0.1.0"
    }
durack1 commented 3 months ago

I was just talking to @wolfiex, he was suggesting that when we register "University of Exeter" we do this with a unique identifier, "UoE" is a little vague, and could also be an identifier for the "University of Edinburgh", "University of England" (if there was one, etc). So his recommendation was for "UoExeter" which is going to be a little hard for other places to attempt to claim.

@thomasaubry does that sound like a reasonable path to you? @wolfiex will have the mip-cmor-tables ready next week for this registration to occur, so we can get that done, alongside finalizing these volcanic datasets.. exciting!

durack1 commented 2 months ago

To connect information across repos, some early data validation was completed by @shipengzhang and exists in two notebooks https://github.com/shipengzhang/evaluate_cmip_volcano/blob/main/volcano/examine_volc_prescribed.ipynb https://nbviewer.org/github/shipengzhang/evaluate_cmip_volcano/blob/main/volcano/examine_volc_prescribed.ipynb https://github.com/shipengzhang/evaluate_cmip_volcano/blob/main/volcano/read_input_emi.ipynb https://nbviewer.org/github/shipengzhang/evaluate_cmip_volcano/blob/main/volcano/read_input_emi.ipynb

It's great to have some eyes on these data, which look quite different from its CMIP6 counterpart! 240821_shipengzhang-github-evaluate_cmip_volcano-examine_volc_prescribed

thomasaubry commented 2 months ago

@wolfiex @durack1 sounds great! I've been using "uoexeter", happy to go with or without capitals :)

wolfiex commented 2 months ago

The id will be lowercase, but there are no such limitations for the CMIP acronym itself. Convention suggests capitalisation with the exception of stopwords so UoExeter might be an option?

I guess the only consideration at this point is what the convention for other consortia and institutions is likely to be.

durack1 commented 2 months ago

@thomasaubry for the final registration (which we should be able to get in the queue ~tomorrow, @wolfiex is finalizing some updates this week), we could go with "UoExeter", which would sync with some of the other registrations we already have e.g. "UoM", "UofMD", but you'll see these weren't really systematic, just unique. The older examples are here input4MIPs_institution_id.json - encapsulated below, whereas we are merging institutions across projects (so input4MIPs is just one MIP project, CMIP6, CMIP5, obs4MIPs etc are others, and there are often cases where the same institution contributes to more than a single project - e.g. many modeling groups contributed to CMIP3, CMIP5, CMIP6, and will contribute to CMIP7 - and keeping their institution_id consistent across these projects is the goal). Any interest in providing the volcanic forcing for CMIP8 😄

[
    "CCCma",
    "CNRM-Cerfacs",
    "CR",
    "DRES",
    "IACETH",
    "IAMC",
    "ImperialCollege",
    "MOHC",
    "MPI-B",
    "MPI-M",
    "MRI",
    "NASA-GSFC",
    "NCAR",
    "NCAS",
    "PCMDI",
    "PNNL-JGCRI",
    "SOLARIS-HEPPA",
    "UCI",
    "UColorado",
    "UReading",
    "UoM",
    "UofMD",
    "VUA"
]