NCAR / ADF

A unified collection of python scripts used to generate standard plots from CAM outputs.
Creative Commons Attribution 4.0 International
35 stars 29 forks source link

Wishlist of obs datasets #105

Open cecilehannay opened 2 years ago

cecilehannay commented 2 years ago

Adding observation to ADF.

What is this new feature?

Jesse is adding model to obs comparison. In this issue, we are collecting the list of datasets and variables we want to prioritize to be able to evaluate the model simulations.

SWCF

LWCF

FLNT

FSNT

PSL

Dataset 1

Dataset 2

SST

We compare at different periods

pre-industrial

present day

PRECT

Global

Tropical

PREH2O

TGCLDLWP

SURF STRESS (TAUX and TAUY)

Next Steps

cecilehannay commented 2 years ago

Next Steps

Would love feedback on next steps (@JulioTBacmeister, @swrneale, @andrewgettelman)

andrewgettelman commented 2 years ago

The list looks good.

For cloud 'microphysics' (CLOUD, TGCLDLWP, TAU, REFF) we probably want to be using the COSP MODIS simulator and associated observations. Especially for cloud fraction. We do have this data from the existing diagnostics, we just have to compare to different fields.

We could also add U, V, T, and probably compare to ERA-Interim (or ERA5). We should have this data already.

brianpm commented 2 years ago

I think this list is good for a starting point. I agree with Andrew about using COSP diagnostics for some of this, including cloud cover (e.g., we should not compare CLDLOW to satellite "low cloud" products). Maybe these can be added incrementally?

While we are in transition to handling a larger and more diverse group of observations with intake-esm (https://github.com/NCAR/ADF/issues/102 , https://github.com/NCAR/ADF/issues/25), a stop-gap solution might be to make a combined, homogenized dataset out of this list. By this, I just mean take these handful of datasets and remap them to the FV1° grid (or whatever grid we want to use for evaluation for the rest of 2022), name them to the corresponding CAM variables, put a comment or something into each one's metadata to indicate the source and time-span, and put them all into climo files that match the ADF convention. That might make it easy to deal with them very similarly as we do with CAM cases within the ADF scripts. Maybe @nusbaume would disagree though?

In the medium-term, I would suggest that we need to update a lot of these datasets. In some cases because there are new (hopefully improved) products (e.g., CERES Ed2.x -> CERES Ed4.1, ERAI -> ERA5). In other cases, just because the observations now cover a longer time period. And I totally agree with next step numbers 2 & 3, finding a permanent place and getting good metadata is crucial.

bitterbark commented 2 years ago

Should one of the highest priorities be to easily swap in a new data set? In the short term, having an easy-to-find list of the files to use, that a user could change and ADF uses to determine which one to read in, should make that a lot easier. Are we thinking of putting this in the variable attributes information? If so, that would work too.

That unfortunately argues against a combined data set that would have to be remade every time any one changes. Although I think having an already-regridded version of each file would be valuable.

andrewgettelman commented 2 years ago

I second Dani's comment: better if it is incremental.

I think all that would be needed is if we had variable attributes for obs_file_name obs_var_name obs_scl

Then as long as there was lat,lon,pressure (if needed) and a reasonable monthly time coordinate, I think the plotting codes could figure out how to load the observations for any variable and get them in the right units. Flexible, could be changed, some manual intervention, but I think that's fine. Also iterative.

swrneale commented 2 years ago

@cecilehannay I presume this is for ANN,DJF->SON,JAN->DEC climo. datasets? I think I agree with Brian, that we should grab updates to the datasets and put them mostly in the existing format so that Cecile could reasonably transition to ADP for CAM5 dev. simulations.

Do we want to make any attempt to overlap observational periods, or should we just grab the longest periods we can each time?

In terms of observational fields we should consult the https://climatedataguide.ucar.edu/ before we do some heavy lifting ourselves. A few things missing: Surface latent and sensible heats (not analysis based) and precipitable water (NCAP available 1988-2009 https://asdc.larc.nasa.gov/project/NVAP-M/NVAP_CLIMATE_Total-Precipitable-Water_1)

Rich

nusbaume commented 2 years ago

In case this helps, my general long-term plan for dealing with observations currently looks something like this:

  1. There is an "official" Intake-ESM catalog that contains all of the (gridded) observational datasets that we want to compare the model against by default. I believe if set-up properly this catalog can manage any sort of temporal resolution and spatial grid, but there will likely need to be fairly strict meta-data requirements for those observational files in order for the ADF to properly search for them in the catalog. In general too I imagine a dataset wouldn't be added to this catalog until it was "blessed" by someone at AMP.

  2. The ADF will also support the ability for the user to specify their own, non-official observational datasets, which can be done via the current variable meta-data YAML file. I am happy if we want to require certain observational file features and meta-data, but I was planning to have the ADF basically accept almost anything and try its best to match the model data to that observational file. This would allow a user to easily add their own observations while protecting the "official" observational data most users will want.

For the short-term I am basically going to implement option 2. Then once that is working I can bring in the infrastructure needed for Intake-ESM, while at the same time we collectively agree on and update the observational datasets we want. At that point I can create the "official" catalog, and we should then have both options available.

andrewgettelman commented 2 years ago

Good plan. I like starting with option 2 and keeping it simple for now, so we can get going with existing data sets, and allow easy extensibility and minimal overhead. When option 1 is on line, we can start to migrate obs over. But that should happen later. Thanks!

nusbaume commented 2 years ago

Hi All,

I just wanted to notify everyone possibly watching this thread that I have recently implemented model vs obs comparisons using the variable defaults file to specify what observational data set to use. You can currently specify the observational file to use (either as just a file name or as a full path if it is located somewhere unique), the name of the observational data set (which will eventually be plugged into plot titles, webpages, etc.), and the name of the variable on the observational file that you want to use (so multiple observational variables can be located in a single file).

Currently the observations can be on any structured lat/lon grid you want, and the only new meta-data requirement is that the observations variable must have a units attribute (which is probably a good idea in general).

In terms of missing features, there is currently no way to deal with 3-D observational data, and the ADF assumes that the observations themselves are monthly climatologies with a time dimension of length 12 (one for each month). Example files can be found on Cheyenne/Casper here (with credit going to @brianpm for the data files themselves):

/glade/work/nusbaume/SE_projects/model_diagnostics/ADF_obs

Of course I am hoping to remove all of these restrictions eventually, so if you have an observational data set you want to use that has a vertical dimension, or that has a different time dimension (e.g. seasonal or daily values) please let me know and I'll help with adding the necessary ADF functionality.

Thanks, and have a good weekend!

cecilehannay commented 2 years ago

@nusbaume: I am a bit confused how to run versus obs.

cecilehannay commented 2 years ago

From @nusbaume: set "compare_obs" to "true" in your config file

andrewgettelman commented 2 years ago

I've done the radiation fields from CERES and some from ERAI. It's pretty simple to do this using the lib/adf_variable_defaults.yaml

I have also added the ability to scale and change units for the data sets (observations and variable independently).

Should be easy to finish this off, and would be a good easy hackathon project....

andrewgettelman commented 2 years ago

Adding some notes from @chengzhuzhang (Jill) at LLNL on the E3SM diagnostics and observations:

Thank you for your feedback! I looked a bit into the provenance of the AODVIS dataset, this observational composite dataset was derived based on the MACv1 (Max-Planck-Institute Aerosol Climatology) dataset from MPI. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/jame.20035

In a recent effort to get in more aerosol diagnostics, I found a more recent version MACv2 available (https://www.tandfonline.com/doi/full/10.1080/16000889.2019.1623639), that I haven’t had a chance to integrate the new dataset.

All of our processed datasets are available publicly at https://web.lcrc.anl.gov/public/e3sm/diagnostics/observations/Atm/,

which contains 4 sub directories that supply data to E3SM diags:

/climatology #include seasonal and annual mean climatology

/time-series # include time-series datasets if available

/arm-diags-data # time-series derived from ARM facilities

/tc-analysis # tropical cycle tracks datasets

The AOD datasets is here: https://web.lcrc.anl.gov/public/e3sm/diagnostics/observations/Atm/climatology/AOD_550/

And as you pointed out, some datasets were grabbed from AMWG, including CRU, SSMI and COSP-OBS datasets. I only assembled the original time series from individual simulators, It would be great if we can get some help or work together for the COSP-OBS!

Thanks,

Jill

brianpm commented 2 years ago

FWIW, I have processed the latest MODIS data into a first draft of a climo file for use in ADF. This is based on initial processing by @jshaw35. The file is here: /glade/work/brianpm/observations/MODIS/climo/MCD06COSP_M3_MODIS.climo.200301-202012.nc

The variables are: CLMODIS (the histogram), CLTMODIS, CLHMODIS, CLMMODIS, CLLMODIS, CLDTHCK_MODIS (high, optically thick clouds), and cloud_mask (which won't be used in general).

Averaging interval is 2003-2020.

UNTESTED so if you see anything weird, just let me know and I can re-process.

chengzhuzhang commented 2 years ago

@brianpm This is great! You beat me to it. I'd be happy to test the datasets. My NCAR computer account was expired about two years ago. I'm submitting a new account request to get data from GLADE.

brianpm commented 2 years ago

Okay, another first draft climatology, this time from CALIPSO GOCCP.

Original data from https://climserv.ipsl.polytechnique.fr/cfmip-obs/Calipso_goccp.html

I took the monthly data for cloud cover maps, cloud phase maps, and cloud fraction profiles and processed them to the monthly climatology. Simple averaging with nothing else going on. It is worth noting that no correction/adjustment is made for the South Atlantic Anomaly, so caution is advised. Global averages should probably mask out the affected region.

Files on glade:

/glade/work/brianpm/observations/clcalipso/climo/3D_CloudFraction330m_200606-202012_climo_CFMIP2_sat_3.1.2.nc
/glade/work/brianpm/observations/clcalipso/climo/MapLowMidHigh330m_200606-202012_climo_CFMIP2_sat_3.1.2.nc
/glade/work/brianpm/observations/clcalipso/climo/MapLowMidHigh_Phase330m_200606-202012_climo_CFMIP2_sat_3.1.2.nc
chengzhuzhang commented 2 years ago

MACv2 (Max-Planck-Institute Aerosol Climatology) is now processed. ref: https://www.tandfonline.com/doi/full/10.1080/16000889.2019.1623639

/glade/work/chengzhu/analysis_data_e3sm_diags/MACv2/climatology

I extracted aod at 550nm, more aerosol optical properties are available if interested, they are under MACv2/original_full_set. Limited testing has been done, for AOD55nm, the global mean matches values in the paper.

Not sure if the AMWG diags style, monthly and seasonal mean climo files still useful for ADF, but i think they should be easily to be converted to the 12 months based climo. My processing script is under MACv2/scripts.

brianpm commented 2 years ago

ISCCP climatology. This is updated to use the ISCCP-H series data. The CTP-TAU histograms are not available in the monthly "basic" files. @Isaaciwd produced monthly mean files from the 3-hourly files. We validated that the values are close to the available monthly means; there is a slight discrepancy that appears to be attributable to the difference in the order of operations vis-à-vis when remapping from equal area to equal angle grids occurs. I took the derived monthly files and made this climo file:

/glade/work/brianpm/observations/isccp/climo/ISCCP-Basic.HGG.GLOBAL.10KM.climo.198307-201706.nc

This file contains:

These were renamed to match CAM's COSP outputs, but not much else was checked for consistency with CAM.

chengzhuzhang commented 2 years ago

@brianpm thank you for the update. It is great that the coordinates/variable names are reformatted to match CAM. I'm wondering if you could share the script, I'm thinking to generate processing scripts that can write out two versions of data (the new ADF version and the AMWG seasonal mean version). Thank you!

brianpm commented 2 years ago

@chengzhuzhang -- Yes, I can definitely share the script. I noticed that the CLDTOT_ISCCP values are higher than I expected. I'm going to try to check on that today (mainly I can't remember if I am supposed to skip the lowest optical depth bin). I'll confirm and remake the file if needed. I can put my script somewhere (maybe just in one of my github repos). I will try to get Isaac's script for doing the 3hr-to-monthly calculation, too.