AusClimateService / indices

Code for calculating climate indices
4 stars 6 forks source link

ERA5 data inputs #3

Open chunhsusu opened 1 year ago

chunhsusu commented 1 year ago

Request implementation to allow use with ERA5 data in rt52:

I assume icclim does not handle hourly dat, so an additional argument can be passed to run_icclim.py, which tells xr.dataarray how to preprocess the hourly data first such as resample('1D').mean() etc.

DamienIrving commented 1 year ago

Happy to implement this.

I guess for temperature we'll need to implement the option of picking resample('1D').mean() for indices that require daily mean temperature, resample('1D').max() for indices that require daily maximum temperature and resample('1D').min() for indices that require daily minimum temperature?

chunhsusu commented 1 year ago

That's right. Note that for era5 data, they have flipped the order of latitude, and the longitude goes from -180 to 180. I don't think it affects your implementation, but something to worry about when we compare between data sets.

DamienIrving commented 1 year ago

@chunhsusu - One more question. Do you want to calculate indices for the entire globe using the ERA5 data or just the Australian region? (If the latter I can add spatial subsetting functionality.)

chunhsusu commented 1 year ago

@DamienIrving For efficiency, yes I think the functionality should be added. Our evaluation work will only be over the Australian region.

DamienIrving commented 1 year ago

Subsetting option added: https://github.com/AusClimateService/indices/commit/e9a6985d2bacdbffc0db00e8074aeb2340ca9b2a

ngben commented 1 year ago

Hi @DamienIrving, thanks for implementing this, I have some questions.

  1. How does time_agg apply for bivariate indices?
  2. Does resample take into account the the required shift in the hourly data? e.g. 2021-01-01 00:00 is actually for 2020-12-31, as mentioned in the ECMWF guide in point 3
  3. Also what's the difference between tp and mtpr? ECMWF provides this guide which I have used before to calculate daily rainfall using tp.
chunhsusu commented 1 year ago

Hi @ngben , On tp v mtpr. We are using the mtpr data archive in /g/data/rt52/era5/single-levels/reanalysis/mtpr/, and there the mtpr is defined as "Mean total precipitation rate". For computing daily total, it makes sense.

How is "tp" defined? rt52 does not seem to have it.

ngben commented 1 year ago

Hi @chunhsusu

How is "tp" defined? rt52 does not seem to have it.

tp is defined as total precipitation (https://apps.ecmwf.int/codes/grib/param-db/?id=228)

I guess if using mtpr the time_agg in run_icclim.py would be "mean"? For tp when using cdo to calculate the daily amount I used cdo daysum

I downloaded hourly tp last year, I'm not sure why rt52 doesn't have it

DamienIrving commented 1 year ago

@ngben:

  1. For the bivariate indices you'll need to use the time_agg flag twice. For example, diurnal temperature range for ERA5 data would look like the following with the --input_files, --variable and --time_agg flags used twice, the first time with the information needed to create the tmax data and the second time to create the tmin data:
/g/data/xv83/dbi599/miniconda3/envs/icclim/bin/python run_icclim.py dtr /g/data/ia39/australian-climate-service/test-data/CORDEX-CMIP6/indices/AUS-gn/none/ECMWF-ERA5/evaluation/r1i1p1f1/none/none/climdex/dtr/dtr_AUS-gn_ECMWF-ERA5_evaluation_r1i1p1f1_year_195901-202112.nc --input_files /g/data/rt52/era5/single-levels/reanalysis/2t/*/*.nc --variable t2m --time_agg max --input_files /g/data/rt52/era5/single-levels/reanalysis/2t/*/*.nc --variable t2m --time_agg min --start_date 1959-01-01 --end_date 2021-12-31 --lon_bnds 111.975 156.275 --lat_bnds -44.525 -9.975 --verbose
  1. The resampling doesn't take into account any required shift in the time values. We could implement an --hour_shift option? What shift needs to be applied?
ngben commented 1 year ago

Thanks @DamienIrving

  1. An hourly shift would need to be applied, e.g. when using cdo the command is as follows: cdo daysum -shifttime,-1hour

Also I'm not sure if it's an issue but ERA5 is in short data format which can cause problems when concatenating as the offsets and scale factors may be different between files. For cdo this requires the use of -b F64

ngben commented 1 year ago

A quick glance through the ECMWF ERA5 wiki suggests that 2t/t2m is instantaneous but mx2t and mn2t are the maximum/minimum "since previous post-processing" https://confluence.ecmwf.int/display/CKB/ERA5%3A+2+metre+temperature

I think mx2t and mn2t would need to be shifted along with tp but I don't know if mtpr also needs it.

DamienIrving commented 1 year ago

I've added a --hshift option that moves the time axis back 1 hour: https://github.com/AusClimateService/indices/commit/692436233bb50ff44046b7c0384096cc0f499a0a

In terms of different scale and offset factors for different files, I'm pretty sure the mask_and_scale=True argument that is passed toxr.open_mfdataset and xr.open_dataset in the script handles that.

ngben commented 1 year ago

Thanks Damien!

For mtpr it should also be shifted as it's for the previous hour

reanalysis: accumulations are over the hour (the accumulation/processing period) ending at the validity date/time

https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Meanrates/fluxesandaccumulations

chunhsusu commented 1 year ago

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift.

Please advise if that sounds correct?

ngben commented 1 year ago

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift.

Please advise if that sounds correct?

Perhaps it might be best to use 2t for tx/tn? ERA5 documentation states:

Given this inconsistency in these three parameters on the CDS, we recommend, in general, that the hourly (analysed) "2 metre temperature" be used to construct the minimum and maximum over longer periods, such as a day.

I think --hshift should be used for mtpr

chunhsusu commented 1 year ago

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift. Please advise if that sounds correct?

Perhaps it might be best to use 2t for tx/tn? ERA5 documentation states:

Given this inconsistency in these three parameters on the CDS, we recommend, in general, that the hourly (analysed) "2 metre temperature" be used to construct the minimum and maximum over longer periods, such as a day.

I think --hshift should be used for mtpr

Thank you Ben, I will follow this.