PCMDI / input4MIPs_CVs

Controlled Vocabularies (CVs) for use in input4MIPs
https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/latest/
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

Test stratospheric aerosol file (extinction) being uploaded on input4mips FTP #80

Open thomasaubry opened 1 month ago

thomasaubry commented 1 month ago

Tagging @durack1 @znichollscr

Strat aerosol test file uploaded for checking

I'm uploading one of my test file to the input4mips FTP for Paul to check as instructed by Zeb, and following instuctions on https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-upload-to-ftp/. The dry run went well and the file is currently uploading. Let me know if there is any issue.

Outstanding questions

The two main outstanding questions for my datasets are: 1) Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.

2) Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters? There is a limited number of eruptions so I'm unsure this makes sense/how the few modelling group modelling this would prefer the data (I can poll them! With UKESM we work from an eruption list). One other concern is that the core data is a mass of SO2 for each eruption. If I grid that as a flux and people regrid that to their model grid(lat/lon, height, not sure whether there would be a time concern), they should try to conserve the mass for each eruption. But that information would be a lot harder to track from a gridded flux file rather than from an eruption list.

znichollscr commented 1 month ago
  1. Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.

I'd go one variable per file. It just plays way nicer with the ESGF machinery that way.

2. Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters?

Trickier. I would do one of two things.

Option a) Make your dimension "emission_number" and then provide time, lat, lon and height as a function of emission number. Also provide "mass_injected" (or whatever you want to call it) as a function of emission_number. Just provide "mass_injected" as total mass, then the modelling groups can work out how they want to spread that over time and their grid. I guess this is relatively clear and easy to work with without making mistakes. My concern is that it would break the ESGF's standard data model, but I could be wrong about what the ESGF requires.

Option b) Use dimensions of ("time", "lat", "lon", "height"). Then you just make "emissions" a function of time, lat, lon and height. There'll be lots of zeros, but that's ok and you can drop out any timesteps with no emissions at all to keep the file size sensible. The emissions would then be a flux. If you do it this way, you have to be very careful with the bounds. You'll have to make sure that the bounds for all variables are sensibly specified (so people know how long the flux should last for i.e. what the total mass to be emitted is, the spatial extent of the flux and the height over which it should be injected). Having written this now, it sounds much more complicated and probably not the approach I would go for. However, if it's the only format that ESGF supports, we might have to take this road.

znichollscr commented 1 month ago

@thomasaubry which folder did you upload to on the FTP server?

durack1 commented 1 month ago

Ok found it ftp://ftp.llnl.gov/incoming/uoe, but it doesn't seem we have any files in there - although we have 3 version subdirs v20240804-v20240806

This latest data file multiple_input4MIPs_aerosolProperties_CMIP_UOE-CMIP-0-1-0_gnz_175001-175112.nc copied across to nimbus /p/cscratch/durack1 mount

durack1 commented 1 month ago

I'd ping @vnaik60 on the below, balancing ideal single variable per file (makes it far easier for ESGF publication) to the easier to use (multiple variables per file)

  • Do you want one file per variable?

  • Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset

thomasaubry commented 1 month ago

Yes, uoe! The data transfer failed this morning, I think because I had bad wifi. Went to campus to relaunch it so it's incoming again. Does the number of thread matter if it's a single file?

vnaik60 commented 1 month ago

linking this discussion with https://github.com/PCMDI/input4MIPs_CVs/issues/9. And some more feedback from GFDL postdoc, Shipeng Zhang here.

znichollscr commented 1 month ago

Does the number of thread matter if it's a single file?

Nope, I'm not that clever :)

vnaik60 commented 1 month ago

Do you want one file per variable?

If it plays well with ESGF, we will go with this. Are you also planning to provide optical properties mapped on to each model's spectral bands? If not, would you be providing some guidance on how this should be done so that there is some consistency across the models?

Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset

option a) that Zeb describes plus information on the vertical extent of the emission would be helpful.

thomasaubry commented 1 month ago

Are you also planning to provide optical properties mapped on to each model's spectral bands? If not, would you be providing some guidance on how this should be done so that there is some consistency across the models?

The plan is to provide a single set of file, but to also provide a piece of codes that modellers can use to convert to their spectral bands. I expect that to be ready in August but I will first ensure that v0 of everything is on ESGF.

option a) that Zeb describes plus information on the vertical extent of the emission would be helpful

Sounds good, on it today so we will see quickly if it works or not.

@znichollscr no matter what network I am on, the upload times out (yesterday afternoon in ended up losing the connection when 78% of the file was uploaded). I'll try again today changing a couple setting on my laptop. Has anyone else run into this problem? the file is not gigantic so I'm surprised.

znichollscr commented 1 month ago

Has anyone else run into this problem? the file is not gigantic so I'm surprised.

Strange. It's not something I've hit, but I also have small files. How big is the file? If this keeps failing, we might have to go for plan b.

thomasaubry commented 1 month ago

I managed to upload my extinction file finally! So it should be with you @durack1 :)

thomasaubry commented 1 month ago

@durack1 after some more debugging the emission file is there too (utsvolcemis_input4MIPs_emissions_CMIP_UOEXETER-CMIP-1-1-0_gn_17551020-20211231.nc). The emission file is the only one needed, for optical properties I've only uploaded one file with extinction. Once you have checked/send feedback as needed I will use that to make the other aerosol optical property files and upload.

durack1 commented 1 month ago

@thomasaubry great! I am a little confused as to what I am trying to get, there are a couple of copies of files in that subdir, so if you list the filenames I am chasing, I'll be able to pull these directly.

It also might be easier if just the netcdf files are uploaded, trying to pore through these subdirs seems unnecessary - I can impose a directory structure once I have the file and valid metadata

znichollscr commented 1 month ago

It also might be easier if just the netcdf files are uploaded, trying to pore through these subdirs seems unnecessary - I can impose a directory structure once I have the file and valid metadata

That's probably my bad. input4MIPs validation writes things in the directory structure, so you get the full thing rather than just single files.

I think it'll be fine if we just tweak the uploads so you can always grab the entire directory, rather than needing to look in sub-dirs.

@thomasaubry when you upload, it's best to upload to a new directory every time, so Paul can just grab the entire directory.

For example, for my CR-CMIP-0-3-0 dataset, I did

input4mips-validation --password "zebedee.nicholls@climate-resource.com"  --ftp-dir-rel-to-root "cr-cmip-0-3-0-0"

Then we found the inevitable bug

input4mips-validation --password "zebedee.nicholls@climate-resource.com"  --ftp-dir-rel-to-root "cr-cmip-0-3-0-1"

Then the next bug

input4mips-validation --password "zebedee.nicholls@climate-resource.com"  --ftp-dir-rel-to-root "cr-cmip-0-3-0-2"

etc.

For you, might make sense to do

input4mips-validation --password "thomas.aubry@uoe.ac.at"  --ftp-dir-rel-to-root "uoe-cmip-0-1-0-0"
input4mips-validation --password "thomas.aubry@uoe.ac.at"  --ftp-dir-rel-to-root "uoe-cmip-0-1-0-1"

etc.

thomasaubry commented 3 weeks ago

Thanks @znichollscr ! @durack1 I've put everything in uoexeter-CMIP-1-1-0 yesterday. Two files to look at: ext_input4MIPs_aerosolOpticalProperties_CMIP_UOEXETER-CMIP-0-1-0_gnz_175001-202312.nc utsvolcemis_input4MIPs_emissions_CMIP_UOEXETER-CMIP-1-1-0_gn_17551020-20211231.nc

Is it more clear now/do you have access? Please do not upload these to ESGF after. I might have misunderstood the workflow but I thought this was just to check metadata/formatting. Once I have your green light on this I will produce the other aerosol optical property files in the same way and add a couple corrections from feedback already received. This won't take time, I've only been slow getting used to the python codes/formatting practice. I might also reset my version numbers, I think it makes sense that what goes on esgf first should be 0-0-1.

znichollscr commented 3 weeks ago

Is it more clear now/do you have access?

All clear to me (all get access once Paul moves these onto the server I have access to, although given you already write these files with input4MIPs-validation I don't think I'll have much to add...)

Please do not upload these to ESGF after. I might have misunderstood the workflow but I thought this was just to check metadata/formatting

Yep, I think we're all on the same page about this

I might also reset my version numbers, I think it makes sense that what goes on esgf first should be 0-0-1

To be honest, I'd suggest not doing this. The first version of our GHG files on ESGF is 0-3-0. I did this just in case anyone had heard/somehow been sent our 0-1-0 and 0-2-0 versions by other channels, to ensure there was zero chance of confusion. The version number is pretty arbitrary, so I'd just keep incrementing from whatever you're up to.

durack1 commented 3 weeks ago

@thomasaubry @znichollscr yep, we're on the system - see /p/cscratch/durack1/ThomasAubry-volcEmissions/20240819

@thomasaubry the comments about versions etc that @znichollscr notes above is consistent with my thinking. There are many moving parts, and so my recommendation is that with any change (no matter how big, a unit issue, or small, a typo in netcdf global attributes) we just continue to increment upward and onward. That will mean that we can then keep a track of what changed and when, and then capture this, so when we have 100+ users of these datasets (highly likely in a couple of months), there is a clear source of information about what is the latest version and what issues were resolved in the previous versions. We've already hit this with the SOLARIS-HEPPA-CMIP-4-3 data (4.1 and 4.2 have already been released for folks to use), and we know there are some inconsistencies with metadata in 4.3, so will catch these very minor (and insignificant to users) tweaks with the 4.4 version which at the very minimum will extend the coverage temporally.

Does that make sense?

thomasaubry commented 3 weeks ago

@durack1 makes perfect sense. Let me know once you've checked and I'll make the other optical properties files and upload the full package :)

znichollscr commented 3 weeks ago

see /p/cscratch/durack1/ThomasAubry-volcEmissions/20240819

Thanks. All look good to me

durack1 commented 3 weeks ago

@vnaik60 did you want to take a peek? Or are we full steam ahead for @thomasaubry to mint the "final" prototype versions and get them in the ESGF publication queue?

vnaik60 commented 3 weeks ago

@durack1 and @thomasaubry march ahead! Issues, if any, can be captured once the files are available on ESGF for wider testing. Thank you so much for getting us to "v0"!

thomasaubry commented 3 weeks ago

all uploading currently, accidentally in uoexeter 🤦‍♂️ will go back to uoexeter-CMIP-x-x-x for next uploads. There are 7 files total, one for emission and 6 for optical properties. I will believe we are at v0 when I see the files on ESGF, despite brilliant help I've been so slow with all the steps after the actual dataset production!

durack1 commented 3 weeks ago

@thomasaubry exciting!

I can see 5 files currently, is that it, or should there be more? Screenshot 2024-08-22 at 12 57 48 PM

Also we don't currently have an "aerosolOpticalProperties" dataset_category, which currently includes:

[
    "GHGConcentrations",
    "SSTsAndSeaIce",
    "aerosolProperties",
    "atmosphericState",
    "emissions",
    "landState",
    "ozone",
    "radiation",
    "solar",
    "surfaceAir",
    "surfaceFluxes"
]

So not sure this was intended?

durack1 commented 3 weeks ago

Will pull these 5 across to the nimbus (evaluation machine), so that @znichollscr can take a peek directly

durack1 commented 3 weeks ago

Will pull these 5 across to the nimbus (evaluation machine), so that @znichollscr can take a peek directly

@znichollscr's these are live now - ~/durack1/ThomasAubry-volcEmissions/20240822

znicholls commented 3 weeks ago

So not sure this was intended?

Should probably be "aerosolProperties" I think ?

znichollscr commented 3 weeks ago

@znichollscr's these are live now - ~/durack1/ThomasAubry-volcEmissions/20240822

They pass the validator (usual caveats about the validators current limits).

@thomasaubry I think there's two outstanding questions:

Nearly there!

thomasaubry commented 3 weeks ago

Hi everyone, awesome!

Thanks!

vnaik60 commented 3 weeks ago

totally exciting and looking forward to seeing on ESGF! @thomasaubry at some point after the dataset goes live, let us know how you would like

to also provide a piece of codes that modellers can use to convert to their spectral bands

via github, zenodo, TT website, etc?

durack1 commented 3 weeks ago

@thomasaubry we're making progress, I can now find 6 files, almost to your target! Screenshot 2024-08-23 at 4 26 11 PM

thomasaubry commented 3 weeks ago

@durack1 yes, I'm on holiday and have no good wifi so it's struggling! It says the 7th file (asy_*.nc) now exists in uoexeter but it looks like the transfer aborted after connection lost so to be sure I've reuploaded this file only in uoexeter_asy. I think safer to use this one. Is it all there?

@vnaik60 yes I think this will go on github. I don't have it yet but development ongoing, hope it's for early september. I've outputted an extensive range of wavelength so I hope that in a first instance modellers can run with wavelength closer to there spectral band midpoints. The code will enable to average optical properties over their specific spectral bands with weighting by solar/terrestrial spectrum. In addition to the code I will have to link documentation and I think the best way will be Zenodo (?).

znichollscr commented 2 weeks ago

Hi @thomasaubry, we can't publish anyway until September 1st at the earliest so don't stress too much. It seems like the upload hasn't worked anyway so, a suggestion. When you're back from holiday (not before):

thomasaubry commented 2 weeks ago

Hi @znicholls , all done under uoxeter-CMIP-1-1-2 :) Let me know if you spot any issue!

durack1 commented 2 weeks ago

@thomasaubry excellent - I can see 7 files! Screenshot 2024-08-27 at 12 10 35 PM

@znicholls these are on the usual place under 20240827.

znichollscr commented 2 weeks ago

Thanks mate. All look good to me. I did a test re-write to the DRS too and that all seemed happy so I think we're good for queuing for publication.

thomasaubry commented 2 weeks ago

Yay that's all of them! Almost there. Looking forward to do some proper science/improvements and writting up manuscripts now.

durack1 commented 2 weeks ago

@thomasaubry wonderful! @sashakames is back from vacation next week, so will drop these as first go in the queue, and hopefully we're live mid-next week!

durack1 commented 2 weeks ago

Noting that #9 can be closed alongside this when these data are live, and our database/HTML pages are updated.

The files are queued for ESGF publication, with version:20240828 - hopefully this happens very early next week, exciting!

thomasaubry commented 2 weeks ago

Corrected a unit problem so I'm already uploading 1-1-3 on the ftp...

durack1 commented 2 weeks ago

@thomasaubry ok great! I'll await clearance from you that it's uploaded and switchout 1.1.3 for the in place 1.1.2 (which is yet to be published, as we're waiting for @sashakames to be back from vacation next Tuesday)

durack1 commented 2 weeks ago

@thomasaubry it looks like I have 7 new files, but these files are identified as 1-1-2 in the source_id (and filename, I have not checked within file metadata)? Did you want to change that?

Up to you, as we've not published any previous data, so we could run with this, or we could run with new 1-1-3 files - over to you..

Screenshot 2024-08-30 at 11 54 13 AM

thomasaubry commented 2 weeks ago

Oh gosh, I forgot to change the root folder...I clearly need to call it a day! I'll try to reupload this evening or over the weekend but I suspect connection will be too bad. If so will just do Monday morning UK time. You can delete that 1-1-3 folder...sorry for the fuss!

durack1 commented 2 weeks ago

@thomasaubry all good! I have no control over the files on the ftp.llnl.gov, I can only upload/create or download, not delete. No problem with waiting until Monday, it's a public holiday this side, so if we have data in place Monday night/Tues US PDT, then I can pull these down within a minute and have them back in the publication queue

thomasaubry commented 1 week ago

Ok this time all files should be in "uoexeter-CMIP-1-1-3_v3". Note "v3", v2 had an issue because of partial transfer (and then couldn't overwrite).

durack1 commented 1 week ago

@thomasaubry perfect. Those files are now downloaded, and they all check out great. I now have them staged for ESGF publication, so hopefully are live very soon!

durack1 commented 1 week ago

And we're live - here woo hoo!

durack1 commented 1 week ago

As UOEXETER-CMIP-1-1-3 is done, we need to update the database (and webpages) and then close out this issue - @znichollscr you got this?

znichollscr commented 1 week ago

Yep will pick up tomorrow in #120