Closed thomasaubry closed 1 month ago
- Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.
I'd go one variable per file. It just plays way nicer with the ESGF machinery that way.
2. Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters?
Trickier. I would do one of two things.
Option a) Make your dimension "emission_number"
and then provide time, lat, lon and height as a function of emission number. Also provide "mass_injected" (or whatever you want to call it) as a function of emission_number. Just provide "mass_injected" as total mass, then the modelling groups can work out how they want to spread that over time and their grid. I guess this is relatively clear and easy to work with without making mistakes. My concern is that it would break the ESGF's standard data model, but I could be wrong about what the ESGF requires.
Option b) Use dimensions of ("time", "lat", "lon", "height"). Then you just make "emissions" a function of time, lat, lon and height. There'll be lots of zeros, but that's ok and you can drop out any timesteps with no emissions at all to keep the file size sensible. The emissions would then be a flux. If you do it this way, you have to be very careful with the bounds. You'll have to make sure that the bounds for all variables are sensibly specified (so people know how long the flux should last for i.e. what the total mass to be emitted is, the spatial extent of the flux and the height over which it should be injected). Having written this now, it sounds much more complicated and probably not the approach I would go for. However, if it's the only format that ESGF supports, we might have to take this road.
@thomasaubry which folder did you upload to on the FTP server?
Ok found it ftp://ftp.llnl.gov/incoming/uoe
, but it doesn't seem we have any files in there - although we have 3 version subdirs v20240804-v20240806
This latest data file multiple_input4MIPs_aerosolProperties_CMIP_UOE-CMIP-0-1-0_gnz_175001-175112.nc
copied across to nimbus /p/cscratch/durack1
mount
I'd ping @vnaik60 on the below, balancing ideal single variable per file (makes it far easier for ESGF publication) to the easier to use (multiple variables per file)
Do you want one file per variable?
Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset
Yes, uoe! The data transfer failed this morning, I think because I had bad wifi. Went to campus to relaunch it so it's incoming again. Does the number of thread matter if it's a single file?
linking this discussion with https://github.com/PCMDI/input4MIPs_CVs/issues/9. And some more feedback from GFDL postdoc, Shipeng Zhang here.
Does the number of thread matter if it's a single file?
Nope, I'm not that clever :)
Do you want one file per variable?
If it plays well with ESGF, we will go with this. Are you also planning to provide optical properties mapped on to each model's spectral bands? If not, would you be providing some guidance on how this should be done so that there is some consistency across the models?
Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset
option a) that Zeb describes plus information on the vertical extent of the emission would be helpful.
Are you also planning to provide optical properties mapped on to each model's spectral bands? If not, would you be providing some guidance on how this should be done so that there is some consistency across the models?
The plan is to provide a single set of file, but to also provide a piece of codes that modellers can use to convert to their spectral bands. I expect that to be ready in August but I will first ensure that v0 of everything is on ESGF.
option a) that Zeb describes plus information on the vertical extent of the emission would be helpful
Sounds good, on it today so we will see quickly if it works or not.
@znichollscr no matter what network I am on, the upload times out (yesterday afternoon in ended up losing the connection when 78% of the file was uploaded). I'll try again today changing a couple setting on my laptop. Has anyone else run into this problem? the file is not gigantic so I'm surprised.
Has anyone else run into this problem? the file is not gigantic so I'm surprised.
Strange. It's not something I've hit, but I also have small files. How big is the file? If this keeps failing, we might have to go for plan b.
I managed to upload my extinction file finally! So it should be with you @durack1 :)
@durack1 after some more debugging the emission file is there too (utsvolcemis_input4MIPs_emissions_CMIP_UOEXETER-CMIP-1-1-0_gn_17551020-20211231.nc). The emission file is the only one needed, for optical properties I've only uploaded one file with extinction. Once you have checked/send feedback as needed I will use that to make the other aerosol optical property files and upload.
@thomasaubry great! I am a little confused as to what I am trying to get, there are a couple of copies of files in that subdir, so if you list the filenames I am chasing, I'll be able to pull these directly.
It also might be easier if just the netcdf files are uploaded, trying to pore through these subdirs seems unnecessary - I can impose a directory structure once I have the file and valid metadata
It also might be easier if just the netcdf files are uploaded, trying to pore through these subdirs seems unnecessary - I can impose a directory structure once I have the file and valid metadata
That's probably my bad. input4MIPs validation writes things in the directory structure, so you get the full thing rather than just single files.
I think it'll be fine if we just tweak the uploads so you can always grab the entire directory, rather than needing to look in sub-dirs.
@thomasaubry when you upload, it's best to upload to a new directory every time, so Paul can just grab the entire directory.
For example, for my CR-CMIP-0-3-0 dataset, I did
input4mips-validation --password "zebedee.nicholls@climate-resource.com" --ftp-dir-rel-to-root "cr-cmip-0-3-0-0"
Then we found the inevitable bug
input4mips-validation --password "zebedee.nicholls@climate-resource.com" --ftp-dir-rel-to-root "cr-cmip-0-3-0-1"
Then the next bug
input4mips-validation --password "zebedee.nicholls@climate-resource.com" --ftp-dir-rel-to-root "cr-cmip-0-3-0-2"
etc.
For you, might make sense to do
input4mips-validation --password "thomas.aubry@uoe.ac.at" --ftp-dir-rel-to-root "uoe-cmip-0-1-0-0"
input4mips-validation --password "thomas.aubry@uoe.ac.at" --ftp-dir-rel-to-root "uoe-cmip-0-1-0-1"
etc.
Thanks @znichollscr ! @durack1 I've put everything in uoexeter-CMIP-1-1-0 yesterday. Two files to look at: ext_input4MIPs_aerosolOpticalProperties_CMIP_UOEXETER-CMIP-0-1-0_gnz_175001-202312.nc utsvolcemis_input4MIPs_emissions_CMIP_UOEXETER-CMIP-1-1-0_gn_17551020-20211231.nc
Is it more clear now/do you have access? Please do not upload these to ESGF after. I might have misunderstood the workflow but I thought this was just to check metadata/formatting. Once I have your green light on this I will produce the other aerosol optical property files in the same way and add a couple corrections from feedback already received. This won't take time, I've only been slow getting used to the python codes/formatting practice. I might also reset my version numbers, I think it makes sense that what goes on esgf first should be 0-0-1.
Is it more clear now/do you have access?
All clear to me (all get access once Paul moves these onto the server I have access to, although given you already write these files with input4MIPs-validation I don't think I'll have much to add...)
Please do not upload these to ESGF after. I might have misunderstood the workflow but I thought this was just to check metadata/formatting
Yep, I think we're all on the same page about this
I might also reset my version numbers, I think it makes sense that what goes on esgf first should be 0-0-1
To be honest, I'd suggest not doing this. The first version of our GHG files on ESGF is 0-3-0. I did this just in case anyone had heard/somehow been sent our 0-1-0 and 0-2-0 versions by other channels, to ensure there was zero chance of confusion. The version number is pretty arbitrary, so I'd just keep incrementing from whatever you're up to.
@thomasaubry @znichollscr yep, we're on the system - see /p/cscratch/durack1/ThomasAubry-volcEmissions/20240819
@thomasaubry the comments about versions etc that @znichollscr notes above is consistent with my thinking. There are many moving parts, and so my recommendation is that with any change (no matter how big, a unit issue, or small, a typo in netcdf global attributes) we just continue to increment upward and onward. That will mean that we can then keep a track of what changed and when, and then capture this, so when we have 100+ users of these datasets (highly likely in a couple of months), there is a clear source of information about what is the latest version and what issues were resolved in the previous versions. We've already hit this with the SOLARIS-HEPPA-CMIP-4-3 data (4.1 and 4.2 have already been released for folks to use), and we know there are some inconsistencies with metadata in 4.3, so will catch these very minor (and insignificant to users) tweaks with the 4.4 version which at the very minimum will extend the coverage temporally.
Does that make sense?
@durack1 makes perfect sense. Let me know once you've checked and I'll make the other optical properties files and upload the full package :)
see
/p/cscratch/durack1/ThomasAubry-volcEmissions/20240819
Thanks. All look good to me
@vnaik60 did you want to take a peek? Or are we full steam ahead for @thomasaubry to mint the "final" prototype versions and get them in the ESGF publication queue?
@durack1 and @thomasaubry march ahead! Issues, if any, can be captured once the files are available on ESGF for wider testing. Thank you so much for getting us to "v0"!
all uploading currently, accidentally in uoexeter 🤦♂️ will go back to uoexeter-CMIP-x-x-x for next uploads. There are 7 files total, one for emission and 6 for optical properties. I will believe we are at v0 when I see the files on ESGF, despite brilliant help I've been so slow with all the steps after the actual dataset production!
@thomasaubry exciting!
I can see 5 files currently, is that it, or should there be more?
Also we don't currently have an "aerosolOpticalProperties" dataset_category
, which currently includes:
[
"GHGConcentrations",
"SSTsAndSeaIce",
"aerosolProperties",
"atmosphericState",
"emissions",
"landState",
"ozone",
"radiation",
"solar",
"surfaceAir",
"surfaceFluxes"
]
So not sure this was intended?
Will pull these 5 across to the nimbus (evaluation machine), so that @znichollscr can take a peek directly
Will pull these 5 across to the nimbus (evaluation machine), so that @znichollscr can take a peek directly
@znichollscr's these are live now - ~/durack1/ThomasAubry-volcEmissions/20240822
So not sure this was intended?
Should probably be "aerosolProperties" I think ?
@znichollscr's these are live now -
~/durack1/ThomasAubry-volcEmissions/20240822
They pass the validator (usual caveats about the validators current limits).
@thomasaubry I think there's two outstanding questions:
Nearly there!
Hi everyone, awesome!
Thanks!
totally exciting and looking forward to seeing on ESGF! @thomasaubry at some point after the dataset goes live, let us know how you would like
to also provide a piece of codes that modellers can use to convert to their spectral bands
via github, zenodo, TT website, etc?
@thomasaubry we're making progress, I can now find 6 files, almost to your target!
@durack1 yes, I'm on holiday and have no good wifi so it's struggling! It says the 7th file (asy_*.nc) now exists in uoexeter but it looks like the transfer aborted after connection lost so to be sure I've reuploaded this file only in uoexeter_asy. I think safer to use this one. Is it all there?
@vnaik60 yes I think this will go on github. I don't have it yet but development ongoing, hope it's for early september. I've outputted an extensive range of wavelength so I hope that in a first instance modellers can run with wavelength closer to there spectral band midpoints. The code will enable to average optical properties over their specific spectral bands with weighting by solar/terrestrial spectrum. In addition to the code I will have to link documentation and I think the best way will be Zenodo (?).
Hi @thomasaubry, we can't publish anyway until September 1st at the earliest so don't stress too much. It seems like the upload hasn't worked anyway so, a suggestion. When you're back from holiday (not before):
Hi @znicholls , all done under uoxeter-CMIP-1-1-2 :) Let me know if you spot any issue!
@thomasaubry excellent - I can see 7 files!
@znicholls these are on the usual place under 20240827
.
Thanks mate. All look good to me. I did a test re-write to the DRS too and that all seemed happy so I think we're good for queuing for publication.
Yay that's all of them! Almost there. Looking forward to do some proper science/improvements and writting up manuscripts now.
@thomasaubry wonderful! @sashakames is back from vacation next week, so will drop these as first go in the queue, and hopefully we're live mid-next week!
Noting that #9 can be closed alongside this when these data are live, and our database/HTML pages are updated.
The files are queued for ESGF publication, with version:20240828
- hopefully this happens very early next week, exciting!
Corrected a unit problem so I'm already uploading 1-1-3 on the ftp...
@thomasaubry ok great! I'll await clearance from you that it's uploaded and switchout 1.1.3 for the in place 1.1.2 (which is yet to be published, as we're waiting for @sashakames to be back from vacation next Tuesday)
@thomasaubry it looks like I have 7 new files, but these files are identified as 1-1-2 in the source_id (and filename, I have not checked within file metadata)? Did you want to change that?
Up to you, as we've not published any previous data, so we could run with this, or we could run with new 1-1-3 files - over to you..
Oh gosh, I forgot to change the root folder...I clearly need to call it a day! I'll try to reupload this evening or over the weekend but I suspect connection will be too bad. If so will just do Monday morning UK time. You can delete that 1-1-3 folder...sorry for the fuss!
@thomasaubry all good! I have no control over the files on the ftp.llnl.gov, I can only upload/create or download, not delete. No problem with waiting until Monday, it's a public holiday this side, so if we have data in place Monday night/Tues US PDT, then I can pull these down within a minute and have them back in the publication queue
Ok this time all files should be in "uoexeter-CMIP-1-1-3_v3". Note "v3", v2 had an issue because of partial transfer (and then couldn't overwrite).
@thomasaubry perfect. Those files are now downloaded, and they all check out great. I now have them staged for ESGF publication, so hopefully are live very soon!
As UOEXETER-CMIP-1-1-3 is done, we need to update the database (and webpages) and then close out this issue - @znichollscr you got this?
Yep will pick up tomorrow in #120
Fixed by https://github.com/PCMDI/input4MIPs_CVs/pull/120, closing
Tagging @durack1 @znichollscr
Strat aerosol test file uploaded for checking
I'm uploading one of my test file to the input4mips FTP for Paul to check as instructed by Zeb, and following instuctions on https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-upload-to-ftp/. The dry run went well and the file is currently uploading. Let me know if there is any issue.
Outstanding questions
The two main outstanding questions for my datasets are: 1) Do you want one file per variable? This feels a bit messy but happy to catter to your preference! I have 8 variables in the aerosol optical property dataset, emission depends.
2) Do you want the emission dataset provided as a gridded (time, lon, lat, height) flux dataset rather than a a list of eruption with emission parameters? There is a limited number of eruptions so I'm unsure this makes sense/how the few modelling group modelling this would prefer the data (I can poll them! With UKESM we work from an eruption list). One other concern is that the core data is a mass of SO2 for each eruption. If I grid that as a flux and people regrid that to their model grid(lat/lon, height, not sure whether there would be a time concern), they should try to conserve the mass for each eruption. But that information would be a lot harder to track from a gridded flux file rather than from an eruption list.