PCMDI / input4MIPs_CVs

Controlled Vocabularies (CVs) for use in input4MIPs
https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/latest/
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

Document download scripts for models #118

Open znichollscr opened 2 weeks ago

znichollscr commented 2 weeks ago

@vnaik60 a question for you!

It seems pretty clear that we're going to have more data on input4MIPs than any one modelling center needs (e.g. you don't need 5 different resolutions of greenhouse gas concentrations, you'll only need one). Hence, our advice to modelling centers will be more nuanced than, "Download all the data".

I was thinking that it would be very helpful if we started a collection of example downloads. I was hoping we could start with your model.

I think the requirements would be relatively simple. We would need to know, for your model:

Then, probably it's also helpful to document any post-processing steps which are likely to be used by multiple models. For example, processing Thomas' data onto the wavelengths of interest for your model.

My instinct is to put this documentation in this repository, so we can update it as soon as we have new data landing. It might make more sense to put it elsewhere of course, so open to suggestions!

cc @durack1

vnaik60 commented 1 week ago

Hi @znichollscr, could you please remind me the motivation for providing this ("collection of example downloads")? I would like to understand the reasons behind this effort before sending you what our model(s) use or getting into a long thread on why this ("example downloads"), despite being well-intended, may not be so useful :-).

znichollscr commented 1 week ago

That's a good point, the ask was too vague.

The intent: for some data sets, we have quite a lot of data, not all of which modelling groups will need. My hope was to build up a resource that shows groups how they can pick just the data they need, to help them navigate the available data, avoid them just downloading it all and hopefully avoid some confusion.

I started thinking about this because of the GHG concentration and emissions data. For the GHG concentration, we're providing data on 5 different grids and for 43 different species. No group needs all that, so showing them how to filter for just what they're interested seemed a good idea, and I figured that I may as well use a real use case rather than just making up a pretend group that only uses global-means. For emissions, they're providing data on 2 grids and have this split between 'main' and 'supplementary'. My thinking was basically the same, it's better to use a real use-case rather than inventing a group that uses all the main data on a 0.5 degree grid and then a few (but not all) the files from the supplementary.

Perhaps a better, much narrower set of questions to get us started then:

durack1 commented 1 week ago

@vnaik60 I believe what @znichollscr is pointing out here that my vague "use esgpull to get input4MIPs data" is a useless comment. Whereas, if you have a little recipe that allows (for e.g. NOAA-GFDL) to get your target data beginning-to-end, then this is a more tangible and useable example that gets modeling groups moving far more quickly than having to work all this stuff out themselves with little guidance (or examples)..

vnaik60 commented 1 week ago

Thanks both!

We do not have a recipe at GFDL for downloading input4mips datasets, we download all that is available with the thought that someone in the lab may need the dataset at some point. Of course, I will acknowledge that we have a never-ending archive which facilitates this, so at GFDL we are privileged!

I can see how this maybe a useful endeavor, especially for newcomer modeling groups who are just spinning up on running CMIP simulations. However, I would not recommend doing this exercise and rather focusing on documenting each dataset with the data provider's recommendations on dos and donts related to their datasets. My reasons are as follows:

I think what is definitely needed is a forcing dataset guide or manual (in addition to the nice table @znichollscr and @durack1 have worked on) that describes in little bit more detail on what is available on ESGF and how it can or not be used (separate from a journal paper), just like here, and more specifically here , here, and here.

Short answers to your specifica Qs:

yes, monthly.

already mentioned above. For chemistry, we use latitudinally varying CH4 concentrations for lower boundary conditions. And there are other configurations of the model that have different dataset needs - for example, we also run with CH4 emissions in which case we do not specify CH4 concentrations.

CMIP class simulations use 0.5deg but as I mentioned above, 0.1deg is used by our variable-grid resolution model. Here are the species needed by our most comprehensive ESM model: "NO","CO","H2","NH3","CH2O", \ ; NOx is as NO2 in anthro but as NO in BB emissions "C2H4","C2H6","C3H6", \ "C3H8","C4H10", \ "CH3OH","C2H5OH", \ "ACETONE", "BC", "OC", "SO2"

znichollscr commented 1 week ago

Thanks @vnaik60, super helpful to understand and very well explained!

I think what is definitely needed is a forcing dataset guide or manual (in addition to the nice table @znichollscr and @durack1 have worked on) that describes in little bit more detail on what is available on ESGF and how it can or not be used (separate from a journal paper), just like here, and more specifically here , here, and here.

Got it, that's a good next step then!