WCRP-CORDEX / cordex-cmip6-cmor-tables

JSON Tables for CMOR3 to create CORDEX-CMIP6 datasets
https://wcrp-cordex.github.io/cordex-cmip6-cmor-tables
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

cell_methods for prhmax (monthly mean) #96

Open gnikulin opened 3 weeks ago

gnikulin commented 3 weeks ago

Currently, we have "cell_methods": "area: mean time: mean within hours time: maximum over hours" for prhmax in CORDEX-CMIP6_mon.json. I think we also need to add "time: mean over days" to be consistent with monthly mean tasmax, tasmin, sfcWindmax and sund.

"cell_methods": "area: mean time: mean within hours time: maximum over hours time: mean over days"

larsbuntemeyer commented 3 weeks ago

Thanks for checking this! The CMIP6 cmor table has it without time: mean over days:

https://github.com/PCMDI/cmip6-cmor-tables/blob/11312205dd53c504a51276af5292fb367ab88f2f/Tables/CMIP6_Emon.json#L3887-L3894

I am not sure about this. The long name Maximum Hourly Precipitation Rate and cell methods indicate that this is supposed to be, in deed, the the maximum from all hours within time bounds. For the monthly frequency, this would be the maximum from all hours in the month and not the daily mean of maximum hourly precipiation. Otherwise, the long name should be Daily Maximum Hourly Precipitation Rate. Maybe @larsbarring could quickly check this? Thanks a lot!

Check, e.g., https://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Emon/prhmax/gr/v20180803/prhmax_Emon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc https://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Eday/prhmax/gr/v20180803/prhmax_Eday_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc

gnikulin commented 3 weeks ago

I assumed that for the mon frequency we always calculate monthly means as for tasmax, tasmin, sfcWindmax and sund but indeed all they are defined as "Daily ...". I just checked and in CORDEX-CMIP5 (http://is-enes-data.github.io/CORDEX_variables_requirement_table.pdf and https://github.com/PCMDI/cordex-cmor-tables/blob/master/Tables/CORDEX_day) prhmax was defined as "Daily Maximum Hourly Precipitation Rate". There was no prhmax in CMIP5, appeared only in CMIP6 with "Maximum Hourly Precipitation Rate".

It would be logical to apply only "mean" for all variables at the mon frequency to avoid confusions, although monthly mean prhmax is not so useful and will be inconsistent with CMIP6.

@ljoakim FYI

larsbarring commented 3 weeks ago

I am not sure that I have so much to add. My interpretation of the CMIP table (linked above) is the same as @larsbuntemeyer's, i.e. that it is the highest hourly value in the entire month. And by adding the "time: mean over days" at the end of the cell-methods the meaning is changed to monthly mean of daily maximum hourly mean precipitation. So, I think that it is basically up to the CORDEX community to conclude which of the two you want to have.

larsbuntemeyer commented 3 weeks ago

Thanks @larsbarring for clarifying this! Very much appreciated!

jesusff commented 3 weeks ago

@gnikulin wrote: It would be logical to apply only "mean" for all variables at the mon frequency to avoid confusions, although monthly mean prhmax is not so useful and will be inconsistent with CMIP6.

Agree. The monthly mean of prhmax would be quite misleading, since it is not even interpretable as an average value of the extremes (as e.g. monthly mean tasmax). In many places, most of the days it doesn't rain and heavy hourly precipitation will be averaged with a lot of zeros. To make some sense of it, we should compute a wet-day monthly average, which would also be inconsistent with the rest of monthly variables.

Maybe the simplest way to avoid confusion and inconsistencies is to skip monthly prhmax...

larsbarring commented 3 weeks ago

Maybe you could check with the CMIP community to (1) make sure that you have not hit a more or less singular typo/error in their tables, and (2) what their rationale might be for including it in their specs.

jesusff commented 3 weeks ago

For CMIP, this is well defined as the maximum of the month. In this very brief issue https://github.com/cmip6dr/CMIP6_DataRequest_VariableDefinitions/issues/125 @matthew-mizielinski and @martinjuckes imply that prhmax is expected to be the maximum hourly precipitation regardless of the aggregation period (6h, day, mon), dropping the "Daily" from the definition.

gnikulin commented 3 weeks ago

I think prhmax is originated in regional modeling, at least I can find it already in ENSEMBLES where prhmaxwas defined as "Max hourly precipitation rate" with cell_methods time: mean within hours time: maximum over hours. This variable was never requested at the mon frequency before CMIP6 where prhmax is defined as in ENSEMBLES. The problem came simply because there was an agreement to provide all variables at the mon frequency (as means) for consistency, even if it's not meaningful for some variables. Perhaps, we need to reconsider this approach and 2 options I would suggest:

Requesting meaningless output should be avoided.

One question is how to communicate such changes with all details. Using https://github.com/WCRP-CORDEX/archive-specifications may be an option to document all changes/corrections and can be linked from the CORDEX website (e.g. https://cordex.org/experiment-guidelines/cordex-cmip6/how-to-provide-cordex-cmip6-data/)

larsbarring commented 3 weeks ago

Thanks for the historic background @gnikulin. In particular I agree with:

Requesting meaningless output should be avoided.

There could be some merit in having monthly maximum prhmax, e.g. for users interested in extremes but with limited data crunching capacity. But for monthly mean prhmax I see very little (=no) use, even if it was to be calculated over rainy days only.

Regarding the agreement to provide monthly data as means, my experience is that often the term "means" should be interpreted more as "a suitable statistical summary operation", which in the overwhelming majority of cases actually is mean. But then there are the odd ones ....

larsbuntemeyer commented 3 weeks ago

I could not find many datasets that have both, e.g., daily prhmax and monthly, but just checking with, e.g.,

curl -J -O https://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Eday/prhmax/gr/v20180803/prhmax_Eday_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc
curl -J -O https://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Emon/prhmax/gr/v20180803/prhmax_Emon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc

and

import xarray as xr
import numpy as np

monthly = xr.open_dataset("prhmax_Emon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc")
daily = xr.open_dataset("prhmax_Eday_IPSL-CM6A-LR_historical_r1i1p1f1_gr_18500101-20141231.nc")

np.allclose(daily.sel(time="2000-01").prhmax.mean("time"), monthly.sel(time="2000-01").prhmax) # gives True
np.allclose(daily.sel(time="2000-01").prhmax.max("time"), monthly.sel(time="2000-01").prhmax) # gives False

So, althouth cell methods indicate otherwise, in this example, it's indeed the mean of the daily maximums and not the maximum of all daily maximums. I agree, that prhmax on a monthly base is not really a useful request but should probably be derived by the user from daily frequencies depending on their needs...

matthew-mizielinski commented 3 weeks ago

Thanks for the historic background @gnikulin. In particular I agree with:

Requesting meaningless output should be avoided.

There could be some merit in having monthly maximum prhmax, e.g. for users interested in extremes but with limited data crunching capacity. But for monthly mean prhmax I see very little (=no) use, even if it was to be calculated over rainy days only.

Regarding the agreement to provide monthly data as means, my experience is that often the term "means" should be interpreted more as "a suitable statistical summary operation", which in the overwhelming majority of cases actually is mean. But then there are the odd ones ....

Hi all,

I'd expect prhmax to be maximum hourly mean precipitation within the time range specified, i.e. no reference to "daily" at all. We had planned to produce this in CMIP6 from HadGEM3 & UKESM, but modelling choices (too much IO = very slow model) lead to no data being produced. The way we would have produced this is to output hourly mean precipitation and then post process this to get the maximum in the required output period (6 hourly, daily and monthly maximums were specified in various MIP tables in CMIP6).

It would have been straightforward to aggregate the daily prhmax into monthly by extracting the maximum value within each month for each grid point.

My reading of your results for the IPSL data above is that there is a bug in the processing that they used -- I'd recommend contacting them so that they can raise an errata (or proposing one yourself!)

larsbuntemeyer commented 3 weeks ago

Thanks @matthew-mizielinski

It would have been straightforward to aggregate the daily prhmax into monthly by extracting the maximum value within each month for each grid point.

Yes, that's what i thought as well, so i actually expected

np.allclose(daily.sel(time="2000-01").prhmax.max("time"), monthly.sel(time="2000-01").prhmax)

to be True!