CMIP6 and CORDEX Jupyter Notebooks for Precipitation Indices (C3S2_520 WP4, user question 2)

almarbo commented 8 months ago

Notebook description

In this notebook, data from a subset of 9 models of CMIP6 Global Climate Models (GCM) , as well as ERA5 reanalysis, are considered. Five precipitation-based ECA&D indices are calculated using the icclim Python package, being them:

Maximum 1-day total precipitation: 'RX1day'.
Maximum 5-day total precipitation: 'RX5day'.
Number of wet days (Precip >= 1mm): 'RR1'.
Number of heavy precipitation days (Precip >= 20mm): 'R20mm'.
Spell length of days with precipitation greater than 1 mm (also known as "Maximum consecutive wet days"): 'CWD'.

These calculations are performed over the historical period from 1971 to 2000 for the temporal aggregation of JJA.

After calculating these indices for the historical period (resulting in index values per year), temporal means and trends are calculated. Following this, the bias of the temporal mean values and the trend bias are calculated, using ERA5 as the reference dataset. These biases are displayed for each model and for the ensemble median. Additionally, maps of the ensemble spread (derived as the standard deviation of the ensemble members' distribution) are calculated and displayed for the mean values and for the trends. Finally, boxplots which represent statistical distributions (PDF) built on the historical trend from each considered model are shown (also in this case ERA 5 is considered as reference product).

The size and location of the subdomain considered are customizable, as well as the temporal aggregation (annual or seasonal).

Analyses

This notebook performs the following analyses:

Maps representing the spatial distribution of the historical mean values (1971-2005) of the indices 'rx1day', 'rx5day', 'rr1', 'r20mm' and 'cwd' for ERA5, each model individually, the ensemble median, and the ensemble spread.
Maps representing the spatial distribution of the historical trends (1971-2005) of the considered indices. Similar to the first analysis, this includes ERA5, each model individually, the ensemble median, and the ensemble spread.
Bias of the historical mean values.
Trend bias maps.
Boxplots which represent statistical distributions (PDF) built on the historical trend from each considered model. Also in this case ERA 5 is considered as reference product.
Boxplots of the trend bias which illustrates the spatially-averaged trend bias for each model.

Notebook link or upload

historical_cmip6_extreme_pr_indices.ipynb.zip

Anything else we need to know?

This notebook keeps the same structure that the one #131 . 5 notebooks will be needed :

One notebook for CMIP6 historical period (1971-2000)
One notebook for CMIP6 future period (2015-2099)
One notebook for CMIP6 future period vs historical
One notebook for CORDEX historical period (1971-2000)
One notebook for CORDEX future period (2015-2099)

*Note: the future notebooks will not need the historical period (as we are not dealing with statistical thresholds) Main changes compared to #131 :

index_names = ("RX1day", "RX5day", "RR1", "R20mm", "CWD")

When getting ERA5 hourly data, no maximum need to be computed. Instead, the accumulation for every day should be calculated. It should change from:

# This lines will need to be changed so that the accumulation over the day is performed for ERA5. 
if resample:
    ds = ds.resample(time="1D").max(keep_attrs=True)

to:

 # This lines will need to be changed so that the accumulation over the day is performed for ERA5. it should change from
  if resample:
        ds = ds.resample(time="1D").sum(keep_attrs=True)

Changes in some models. The new model list are:


models_cordex = [
"clmcom_clm_cclm4_8_17",
"clmcom_eth_cosmo_crclim",
"cnrm_aladin63",
"dmi_hirham5",
"knmi_racmo22e",
"mohc_hadrem3_ga7_05",
"mpi_csc_remo2009",
"smhi_rca4",
"uhoh_wrf361h",
]

models_cmip6 = [ "access_cm2", "bcc_csm2_mr", "cmcc_esm2", "cnrm_cm6_1_hr", "ec_earth3_cc", "gfdl_esm4", "inm_cm5_0", "miroc6", "mpi_esm1_2_lr", ]



- Changes in the name of the variables when doing the requests

- Probably, some changes when plotting related to colorbars and these kind of things?

**Note**: The notebook has not been run (problems with era5 retrieving), the provided notebook is just a draft

### Environment

<details>

name: wp4
channels:
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - aiohttp=3.9.3=py311h459d7ec_0
  - aiosignal=1.3.1=pyhd8ed1ab_0
  - alsa-lib=1.2.10=hd590300_0
  - annotated-types=0.6.0=pyhd8ed1ab_0
  - ansiwrap=0.8.4=py_0
  - antlr-python-runtime=4.11.1=pyhd8ed1ab_0
  - anyio=4.2.0=pyhd8ed1ab_0
  - argon2-cffi=23.1.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py311h459d7ec_4
  - arrow=1.3.0=pyhd8ed1ab_0
  - asciitree=0.3.3=py_2
  - asttokens=2.4.1=pyhd8ed1ab_0
  - async-lru=2.0.4=pyhd8ed1ab_0
  - attr=2.5.1=h166bdaf_1
  - attrs=23.2.0=pyh71513ae_0
  - aws-c-auth=0.7.15=h70caa3e_0
  - aws-c-cal=0.6.9=h14ec70c_3
  - aws-c-common=0.9.12=hd590300_0
  - aws-c-compression=0.2.17=h572eabf_8
  - aws-c-event-stream=0.4.1=h17cd1f3_5
  - aws-c-http=0.8.0=hc6da83f_5
  - aws-c-io=0.14.3=h3c8c088_1
  - aws-c-mqtt=0.10.1=h0ef3971_3
  - aws-c-s3=0.5.0=h1b46bed_2
  - aws-c-sdkutils=0.1.14=h572eabf_0
  - aws-checksums=0.1.17=h572eabf_7
  - aws-crt-cpp=0.26.1=h33f84b2_9
  - aws-sdk-cpp=1.11.242=h65f022c_0
  - azure-core-cpp=1.10.3=h91d86a7_1
  - azure-storage-blobs-cpp=12.10.0=h00ab1b0_0
  - azure-storage-common-cpp=12.5.0=hb858b4b_2
  - babel=2.14.0=pyhd8ed1ab_0
  - beautifulsoup4=4.12.3=pyha770c72_0
  - black=24.1.1=py311h38be061_0
  - bleach=6.1.0=pyhd8ed1ab_0
  - blosc=1.21.5=h0f2a231_0
  - bokeh=3.3.4=pyhd8ed1ab_0
  - boltons=23.1.1=pyhd8ed1ab_0
  - bottleneck=1.3.7=py311h1f0f07a_1
  - branca=0.7.1=pyhd8ed1ab_0
  - brotli=1.1.0=hd590300_1
  - brotli-bin=1.1.0=hd590300_1
  - brotli-python=1.1.0=py311hb755f60_1
  - bzip2=1.0.8=hd590300_5
  - c-ares=1.26.0=hd590300_0
  - ca-certificates=2024.2.2=hbcca054_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cairo=1.18.0=h3faef2a_0
  - cartopy=0.22.0=py311h320fe9a_1
  - cdsapi=0.6.1=pyhd8ed1ab_0
  - certifi=2024.2.2=pyhd8ed1ab_0
  - cf-units=3.2.0=py311h1f0f07a_4
  - cf_xarray=0.8.9=pyhd8ed1ab_0
  - cffi=1.16.0=py311hb3a22ac_0
  - cfgrib=0.9.10.4=pyhd8ed1ab_0
  - cfitsio=4.3.1=hbdc6101_0
  - cftime=1.6.3=py311h1f0f07a_0
  - charset-normalizer=3.3.2=pyhd8ed1ab_0
  - click=8.1.7=unix_pyh707e725_0
  - click-plugins=1.1.1=py_0
  - cligj=0.7.2=pyhd8ed1ab_1
  - cloudpickle=3.0.0=pyhd8ed1ab_0
  - cmocean=3.1.3=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - colorspacious=1.1.2=pyh24bf2e0_0
  - comm=0.2.1=pyhd8ed1ab_0
  - contourpy=1.2.0=py311h9547e67_0
  - cycler=0.12.1=pyhd8ed1ab_0
  - cytoolz=0.12.3=py311h459d7ec_0
  - dask=2024.2.0=pyhd8ed1ab_0
  - dask-core=2024.2.0=pyhd8ed1ab_0
  - dateparser=1.2.0=pyhd8ed1ab_0
  - dbus=1.13.6=h5008d03_3
  - debugpy=1.8.1=py311hb755f60_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distributed=2024.2.0=pyhd8ed1ab_0
  - eccodes=2.34.0=he84ddb8_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - esmf=8.6.0=nompi_h7b237b1_0
  - esmpy=8.6.0=pyhc1e730c_0
  - exceptiongroup=1.2.0=pyhd8ed1ab_2
  - executing=2.0.1=pyhd8ed1ab_0
  - expat=2.5.0=hcb278e6_1
  - fasteners=0.17.3=pyhd8ed1ab_0
  - findlibs=0.0.5=pyhd8ed1ab_0
  - fiona=1.9.5=py311hf8e0aa6_3
  - folium=0.15.1=pyhd8ed1ab_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_1
  - fontconfig=2.14.2=h14ed4e7_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.48.1=py311h459d7ec_0
  - fqdn=1.5.1=pyhd8ed1ab_0
  - freeglut=3.2.2=hac7e632_2
  - freetype=2.12.1=h267a509_2
  - freexl=2.0.0=h743c826_0
  - frozenlist=1.4.1=py311h459d7ec_0
  - fsspec=2024.2.0=pyhca7485f_0
  - gdal=3.8.3=py311h8be719e_2
  - geopandas=0.14.3=pyhd8ed1ab_0
  - geopandas-base=0.14.3=pyha770c72_0
  - geos=3.12.1=h59595ed_0
  - geotiff=1.7.1=h6b2125f_15
  - gettext=0.21.1=h27087fc_0
  - gflags=2.2.2=he1b5a44_1004
  - giflib=5.2.1=h0b41bf4_3
  - glib=2.78.3=hfc55251_0
  - glib-tools=2.78.3=hfc55251_0
  - glog=0.6.0=h6f12383_0
  - graphite2=1.3.13=h58526e2_1001
  - greenlet=3.0.3=py311hb755f60_0
  - gst-plugins-base=1.22.9=h8e1006c_0
  - gstreamer=1.22.9=h98fc4e7_0
  - h11=0.14.0=pyhd8ed1ab_0
  - h2=4.1.0=pyhd8ed1ab_0
  - harfbuzz=8.3.0=h3d44ed6_0
  - hdf4=4.2.15=h2a13503_7
  - hdf5=1.14.3=nompi_h4f84152_100
  - hpack=4.0.0=pyh9f0ad1d_0
  - httpcore=1.0.3=pyhd8ed1ab_0
  - httpx=0.26.0=pyhd8ed1ab_0
  - hyperframe=6.0.1=pyhd8ed1ab_0
  - icclim=6.5.0=pyhd8ed1ab_0
  - icu=73.2=h59595ed_0
  - idna=3.6=pyhd8ed1ab_0
  - importlib-metadata=7.0.1=pyha770c72_0
  - importlib_metadata=7.0.1=hd8ed1ab_0
  - importlib_resources=6.1.1=pyhd8ed1ab_0
  - ipykernel=6.29.2=pyhd33586a_0
  - ipython=8.21.0=pyh707e725_0
  - isoduration=20.11.0=pyhd8ed1ab_0
  - jasper=4.2.0=he6dfbbe_0
  - jedi=0.19.1=pyhd8ed1ab_0
  - jinja2=3.1.3=pyhd8ed1ab_0
  - joblib=1.3.2=pyhd8ed1ab_0
  - json-c=0.17=h7ab15ed_0
  - json5=0.9.14=pyhd8ed1ab_0
  - jsonpickle=3.0.2=pyhd8ed1ab_1
  - jsonpointer=2.4=py311h38be061_3
  - jsonschema=4.21.1=pyhd8ed1ab_0
  - jsonschema-specifications=2023.12.1=pyhd8ed1ab_0
  - jsonschema-with-format-nongpl=4.21.1=pyhd8ed1ab_0
  - jupyter-lsp=2.2.2=pyhd8ed1ab_0
  - jupyter-server-proxy=4.1.0=pyhd8ed1ab_0
  - jupyter_client=8.6.0=pyhd8ed1ab_0
  - jupyter_core=5.7.1=py311h38be061_0
  - jupyter_events=0.9.0=pyhd8ed1ab_0
  - jupyter_server=2.12.5=pyhd8ed1ab_0
  - jupyter_server_terminals=0.5.2=pyhd8ed1ab_0
  - jupyterlab=4.1.1=pyhd8ed1ab_0
  - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1
  - jupyterlab_server=2.25.3=pyhd8ed1ab_0
  - kealib=1.5.3=h2f55d51_0
  - keyutils=1.6.1=h166bdaf_0
  - kiwisolver=1.4.5=py311h9547e67_1
  - krb5=1.21.2=h659d440_0
  - lame=3.100=h166bdaf_1003
  - lcms2=2.16=hb7c19ff_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - lerc=4.0.0=h27087fc_0
  - libabseil=20230802.1=cxx17_h59595ed_0
  - libaec=1.1.2=h59595ed_1
  - libarchive=3.7.2=h2aa1ff5_1
  - libarrow=15.0.0=he2c5238_2_cpu
  - libarrow-acero=15.0.0=h59595ed_2_cpu
  - libarrow-dataset=15.0.0=h59595ed_2_cpu
  - libarrow-flight=15.0.0=hdc44a87_2_cpu
  - libarrow-flight-sql=15.0.0=hfbc7f12_2_cpu
  - libarrow-gandiva=15.0.0=hacb8726_2_cpu
  - libarrow-substrait=15.0.0=hfbc7f12_2_cpu
  - libblas=3.9.0=21_linux64_openblas
  - libboost-headers=1.84.0=ha770c72_1
  - libbrotlicommon=1.1.0=hd590300_1
  - libbrotlidec=1.1.0=hd590300_1
  - libbrotlienc=1.1.0=hd590300_1
  - libcap=2.69=h0f662aa_0
  - libcblas=3.9.0=21_linux64_openblas
  - libclang=15.0.7=default_hb11cfb5_4
  - libclang13=15.0.7=default_ha2b6cf4_4
  - libcrc32c=1.1.2=h9c3ff4c_0
  - libcups=2.3.3=h4637d8d_4
  - libcurl=8.5.0=hca28451_0
  - libdeflate=1.19=hd590300_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=hd590300_2
  - libevent=2.1.12=hf998b51_1
  - libexpat=2.5.0=hcb278e6_1
  - libffi=3.4.2=h7f98852_5
  - libflac=1.4.3=h59595ed_0
  - libgcc-ng=13.2.0=h807b86a_5
  - libgcrypt=1.10.3=hd590300_0
  - libgdal=3.8.3=h80d7d79_2
  - libgfortran-ng=13.2.0=h69a702a_5
  - libgfortran5=13.2.0=ha4646dd_5
  - libglib=2.78.3=h783c2da_0
  - libglu=9.0.0=hac7e632_1003
  - libgomp=13.2.0=h807b86a_5
  - libgoogle-cloud=2.12.0=hef10d8f_5
  - libgpg-error=1.47=h71f35ed_0
  - libgrpc=1.60.1=h74775cd_0
  - libiconv=1.17=hd590300_2
  - libjpeg-turbo=3.0.0=hd590300_1
  - libkml=1.3.0=h01aab08_1018
  - liblapack=3.9.0=21_linux64_openblas
  - libllvm14=14.0.6=hcd5def8_4
  - libllvm15=15.0.7=hb3ce162_4
  - libnetcdf=4.9.2=nompi_h9612171_113
  - libnghttp2=1.58.0=h47da74e_1
  - libnl=3.9.0=hd590300_0
  - libnsl=2.0.1=hd590300_0
  - libnuma=2.0.16=h0b41bf4_1
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.26=pthreads_h413a1c8_0
  - libopus=1.3.1=h7f98852_1
  - libparquet=15.0.0=h352af49_2_cpu
  - libpng=1.6.42=h2797004_0
  - libpq=16.2=h33b98f1_0
  - libprotobuf=4.25.1=hf27288f_1
  - libre2-11=2023.06.02=h7a70373_0
  - librttopo=1.1.0=h8917695_15
  - libsndfile=1.2.2=hc60ed4a_1
  - libsodium=1.0.18=h36c2ea0_1
  - libspatialindex=1.9.3=h9c3ff4c_4
  - libspatialite=5.1.0=h7bd4643_4
  - libsqlite=3.45.1=h2797004_0
  - libssh2=1.11.0=h0841786_0
  - libstdcxx-ng=13.2.0=h7e041cc_5
  - libsystemd0=255=h3516f8a_0
  - libthrift=0.19.0=hb90f79a_1
  - libtiff=4.6.0=ha9c0a0a_2
  - libudunits2=2.2.28=h40f5838_3
  - libutf8proc=2.8.0=h166bdaf_0
  - libuuid=2.38.1=h0b41bf4_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp-base=1.3.2=hd590300_0
  - libxcb=1.15=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libxkbcommon=1.6.0=hd429924_1
  - libxml2=2.12.5=h232c23b_0
  - libzip=1.10.1=h2629f0a_3
  - libzlib=1.2.13=hd590300_5
  - llvmlite=0.42.0=py311ha6695c7_1
  - lmoments3=1.0.6=pyhd8ed1ab_0
  - locket=1.0.0=pyhd8ed1ab_0
  - lz4=4.3.3=py311h38e4bf4_0
  - lz4-c=1.9.4=hcb278e6_0
  - lzo=2.10=h516909a_1000
  - mapclassify=2.6.1=pyhd8ed1ab_0
  - markdown-it-py=3.0.0=pyhd8ed1ab_0
  - markupsafe=2.1.5=py311h459d7ec_0
  - matplotlib=3.8.2=py311h38be061_0
  - matplotlib-base=3.8.2=py311h54ef318_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mdurl=0.1.2=pyhd8ed1ab_0
  - minizip=4.0.4=h0ab5242_0
  - mistune=3.0.2=pyhd8ed1ab_0
  - mpg123=1.32.4=h59595ed_0
  - msgpack-python=1.0.7=py311h9547e67_0
  - multidict=6.0.5=py311h459d7ec_0
  - munkres=1.1.4=pyh9f0ad1d_0
  - mypy_extensions=1.0.0=pyha770c72_0
  - mysql-common=8.0.33=hf1915f5_6
  - mysql-libs=8.0.33=hca2cd23_6
  - nbclient=0.8.0=pyhd8ed1ab_0
  - nbconvert=7.16.0=pyhd8ed1ab_0
  - nbconvert-core=7.16.0=pyhd8ed1ab_0
  - nbconvert-pandoc=7.16.0=pyhd8ed1ab_0
  - nbformat=5.9.2=pyhd8ed1ab_0
  - nc-time-axis=1.4.1=pyhd8ed1ab_0
  - ncurses=6.4=h59595ed_2
  - nest-asyncio=1.6.0=pyhd8ed1ab_0
  - netcdf-fortran=4.6.1=nompi_hacb5139_103
  - netcdf4=1.6.5=nompi_py311he8ad708_100
  - networkx=3.2.1=pyhd8ed1ab_0
  - notebook-shim=0.2.4=pyhd8ed1ab_0
  - nspr=4.35=h27087fc_0
  - nss=3.97=h1d7d5a4_0
  - numba=0.59.0=py311h96b013e_1
  - numcodecs=0.12.1=py311hb755f60_0
  - numpy=1.26.4=py311h64a7726_0
  - openjpeg=2.5.0=h488ebb8_3
  - openssl=3.2.1=hd590300_0
  - orc=1.9.2=h7829240_1
  - overrides=7.7.0=pyhd8ed1ab_0
  - packaging=23.2=pyhd8ed1ab_0
  - pandas=2.2.0=py311h320fe9a_0
  - pandoc=3.1.11.1=ha770c72_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - papermill=2.4.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - partd=1.4.1=pyhd8ed1ab_0
  - pathspec=0.12.1=pyhd8ed1ab_0
  - patsy=0.5.6=pyhd8ed1ab_0
  - pcre2=10.42=hcad00b1_0
  - pexpect=4.9.0=pyhd8ed1ab_0
  - pickleshare=0.7.5=py_1003
  - pillow=10.2.0=py311ha6c5da5_0
  - pint=0.23=pyhd8ed1ab_0
  - pip=24.0=pyhd8ed1ab_0
  - pixman=0.43.2=h59595ed_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
  - platformdirs=4.2.0=pyhd8ed1ab_0
  - plotly=5.18.0=pyhd8ed1ab_0
  - ply=3.11=py_1
  - poppler=24.02.0=h590f24d_0
  - poppler-data=0.4.12=hd8ed1ab_0
  - postgresql=16.2=h7387d8b_0
  - proj=9.3.1=h1d62c97_0
  - prometheus_client=0.20.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.42=pyha770c72_0
  - properscoring=0.1=py_0
  - psutil=5.9.8=py311h459d7ec_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pulseaudio-client=16.1=hb77b528_5
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pyarrow=15.0.0=py311h39c9aba_2_cpu
  - pyarrow-hotfix=0.6=pyhd8ed1ab_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pydantic=2.6.1=pyhd8ed1ab_0
  - pydantic-core=2.16.2=py311h46250e7_1
  - pydantic-settings=2.1.0=pyhd8ed1ab_1
  - pygments=2.17.2=pyhd8ed1ab_0
  - pyparsing=3.1.1=pyhd8ed1ab_0
  - pyproj=3.6.1=py311hca0b8b9_5
  - pyqt=5.15.9=py311hf0fb5b6_5
  - pyqt5-sip=12.12.2=py311hb755f60_5
  - pyshp=2.3.1=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - python=3.11.7=hab00c5b_1_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-dotenv=1.0.1=pyhd8ed1ab_0
  - python-eccodes=1.6.1=py311h1f0f07a_1
  - python-fastjsonschema=2.19.1=pyhd8ed1ab_0
  - python-json-logger=2.0.7=pyhd8ed1ab_0
  - python-tzdata=2024.1=pyhd8ed1ab_0
  - python_abi=3.11=4_cp311
  - pytz=2024.1=pyhd8ed1ab_0
  - pyyaml=6.0.1=py311h459d7ec_1
  - pyzmq=25.1.2=py311h34ded2d_0
  - qt-main=5.15.8=h5810be5_19
  - rdma-core=50.0=hd3aeb46_0
  - re2=2023.06.02=h2873b5e_0
  - readline=8.2=h8228510_1
  - rechunker=0.5.2=pyhd8ed1ab_1
  - referencing=0.33.0=pyhd8ed1ab_0
  - regex=2023.12.25=py311h459d7ec_0
  - requests=2.31.0=pyhd8ed1ab_0
  - rfc3339-validator=0.1.4=pyhd8ed1ab_0
  - rfc3986-validator=0.1.1=pyh9f0ad1d_0
  - rich=13.7.0=pyhd8ed1ab_0
  - rpds-py=0.18.0=py311h46250e7_0
  - rtree=1.2.0=py311h3bb2b0f_0
  - s2n=1.4.3=h06160fa_0
  - scikit-learn=1.4.0=py311hc009520_0
  - scipy=1.12.0=py311h64a7726_2
  - seaborn=0.13.2=hd8ed1ab_0
  - seaborn-base=0.13.2=pyhd8ed1ab_0
  - send2trash=1.8.2=pyh41d4057_0
  - setuptools=69.0.3=pyhd8ed1ab_0
  - shapely=2.0.2=py311h2032efe_1
  - shellingham=1.5.4=pyhd8ed1ab_0
  - simpervisor=1.0.0=pyhd8ed1ab_0
  - sip=6.7.12=py311hb755f60_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.10=h9fff704_0
  - sniffio=1.3.0=pyhd8ed1ab_0
  - sortedcontainers=2.4.0=pyhd8ed1ab_0
  - soupsieve=2.5=pyhd8ed1ab_1
  - sparse=0.15.1=pyhd8ed1ab_1
  - sqlalchemy=2.0.27=py311h459d7ec_0
  - sqlite=3.45.1=h2c6b66d_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - statsmodels=0.14.1=py311h1f0f07a_0
  - structlog=24.1.0=pyhd8ed1ab_0
  - tblib=3.0.0=pyhd8ed1ab_0
  - tenacity=8.2.3=pyhd8ed1ab_0
  - terminado=0.18.0=pyh0d859eb_0
  - textwrap3=0.9.2=py_0
  - threadpoolctl=3.3.0=pyhc1e730c_0
  - tiledb=2.19.1=h4386cac_0
  - tinycss2=1.2.1=pyhd8ed1ab_0
  - tk=8.6.13=noxft_h4845f30_101
  - toml=0.10.2=pyhd8ed1ab_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - toolz=0.12.1=pyhd8ed1ab_0
  - tornado=6.3.3=py311h459d7ec_1
  - tqdm=4.66.2=pyhd8ed1ab_0
  - traitlets=5.14.1=pyhd8ed1ab_0
  - typer=0.9.0=pyhd8ed1ab_0
  - types-python-dateutil=2.8.19.20240106=pyhd8ed1ab_0
  - typing-extensions=4.9.0=hd8ed1ab_0
  - typing_extensions=4.9.0=pyha770c72_0
  - typing_utils=0.1.0=pyhd8ed1ab_0
  - tzcode=2024a=h3f72095_0
  - tzdata=2024a=h0c530f3_0
  - tzlocal=5.2=py311h38be061_0
  - ucx=1.15.0=h75e419f_3
  - udunits2=2.2.28=h40f5838_3
  - uri-template=1.3.0=pyhd8ed1ab_0
  - uriparser=0.9.7=hcb278e6_1
  - urllib3=2.2.0=pyhd8ed1ab_0
  - wcwidth=0.2.13=pyhd8ed1ab_0
  - webcolors=1.13=pyhd8ed1ab_0
  - webencodings=0.5.1=pyhd8ed1ab_2
  - websocket-client=1.7.0=pyhd8ed1ab_0
  - wheel=0.42.0=pyhd8ed1ab_0
  - xarray=2024.1.1=pyhd8ed1ab_0
  - xarraymannkendall=1.4.5=pyhd8ed1ab_0
  - xcb-util=0.4.0=hd590300_1
  - xcb-util-image=0.4.0=h8ee46fc_1
  - xcb-util-keysyms=0.4.0=h8ee46fc_1
  - xcb-util-renderutil=0.3.9=hd590300_1
  - xcb-util-wm=0.4.1=h8ee46fc_1
  - xclim=0.46.0=py311h38be061_0
  - xerces-c=3.2.5=hac6953d_0
  - xesmf=0.8.2=pyhd8ed1ab_0
  - xhistogram=0.3.2=pyhd8ed1ab_0
  - xkeyboard-config=2.41=hd590300_0
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-inputproto=2.3.2=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.1.1=hd590300_0
  - xorg-libsm=1.2.4=h7391055_0
  - xorg-libx11=1.8.7=h8ee46fc_0
  - xorg-libxau=1.0.11=hd590300_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h0b41bf4_2
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxi=1.7.10=h7f98852_0
  - xorg-libxrender=0.9.11=hd590300_0
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h0b41bf4_1003
  - xorg-xf86vidmodeproto=2.3.1=h7f98852_1002
  - xorg-xproto=7.0.31=h7f98852_1007
  - xskillscore=0.0.24=pyhd8ed1ab_0
  - xyzservices=2023.10.1=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - yarl=1.9.4=py311h459d7ec_0
  - zarr=2.17.0=pyhd8ed1ab_0
  - zeromq=4.3.5=h59595ed_0
  - zict=3.0.0=pyhd8ed1ab_0
  - zipp=3.17.0=pyhd8ed1ab_0
  - zlib=1.2.13=hd590300_5
  - zstd=1.5.5=hfc55251_0
  - pip:
      - c3s-eqc-automatic-quality-control==0.1.2.dev103+gaf19fc1
      - cacholote==0.7.2
      - cads-toolbox==0.0.2b0
      - cgul==0.0.4
      - coucal==0.0.1b3
      - emohawk==0.0.4b0
      - kaleido==0.2.1
      - skillmetrics==1.2.4
      - xlsxwriter==3.1.9
prefix: /data/common/miniforge3/envs/wp4

</details>

almarbo commented 8 months ago

I'm leaving here an updated notebook run for the period 1971-1975. It's a pretty early version, built upon the progress in #131, and it'll need more work, especially concerning the RX5day index. Further iterations will be necessary once #131 is closed.

historical_cmip6_extreme_pr_indices.ipynb.zip

malmans2 commented 8 months ago

Hi @almarbo,

Things have changed quite a bit since you opened this issue. Can I start working on this or do you need to revise your comments/draft notebooks?

almarbo commented 8 months ago

Actually, it would be nice to have the exact same structure that we have for maximum temperature. If you want me I can adapt the draft with the name of the variables and so on and then I upload it here.

malmans2 commented 8 months ago

I think I can do it, I'll give it a go in the afternoon or tomorrow. The most important thing is that the descriptions of the changes in your first comment are correct and complete (I haven't looked at it yet)

almarbo commented 8 months ago

Hi @malmans2

I have been working a little bit on it,

I leave here a notebook that includes the three sub-notebooks (historical, future and Global warming levels) for CMIP6:

CMIP6_pr_indices.ipynb.zip

There are still some things that does not work for me: (1) I am not being able to invert the colorbars of the figures (blue tonalities should represent wetter conditions) and (2) I am having problems with the RX5day index for the models (not for ERA5). I am sure it has to do with units stuff and the way icclim interpret them. I have tried to include a factor 86400 (to convert from kg/(m^2s) to mm) but it does not change anything

malmans2 commented 8 months ago

I am having problems with the RX5day index for the models (not for ERA5). I am sure it has to do with units stuff and the way icclim interpret them. I have tried to include a factor 86400 (to convert from kg/(m^2s) to mm) but it does not change anything

I need more info about the issues you are having in order to help. I'd make a MRE so we can easily debug it. For example, download a small sample without even applying ay transform function, then send me a snippet that shows the issue. Something like this:

ds = download.download_and_transform(collection_id, request, chunks={"year": 1})
ds_index = icclim.index("RX5day", ...)
...

almarbo commented 8 months ago

Hi, when doing a minimal reproducible example I am not experiencing any issue:

import numpy as np
# Time period
year_start = 1971
year_stop = 1972
model='access_cm2'
request=[request | {model_key: model} for request in request_sim[1]]
ds = download.download_and_transform(request_sim[0], request, chunks={"year": 1})
ds_index = icclim.index(index_name="RX5day",
            in_files=ds,
            slice_mode="JJA",
        ).drop_dims("bounds")
ds_index_mean=ds_index.mean("time")
print(np.max(ds_index_mean["RX5day"].values))
113.3097110723611

However, if I follow the workflow (which is the same as the one for the temperature case but adapted for precipitation - as you can see in the last version of the notebook that I shared) I get a very different value:

print(np.max(model_datasets['access_cm2']["RX5day"].values))
8182.29363571736

If I test for another index (e.g. "RX1 day"), I get the same results for both cases my MRE and the obtained using the whole code

almarbo commented 8 months ago

I can provide also the request parameter:

[{'area': [72, -22, 27, 45],
  'day': ['01',
   '02',
   '03',
   '04',
   '05',
   '06',
   '07',
   '08',
   '09',
   '10',
   '11',
   '12',
   '13',
   '14',
   '15',
   '16',
   '17',
   '18',
   '19',
   '20',
   '21',
   '22',
   '23',
   '24',
   '25',
   '26',
   '27',
   '28',
   '29',
   '30',
   '31'],
  'experiment': 'historical',
  'format': 'zip',
  'month': ['01',
   '02',
   '03',
   '04',
   '05',
   '06',
   '07',
   '08',
   '09',
   '10',
   '11',
   '12'],
  'temporal_resolution': 'daily',
  'variable': 'precipitation',
  'year': '1971',
  'model': 'access_cm2'},
 {'area': [72, -22, 27, 45],
  'day': ['01',
   '02',
   '03',
   '04',
   '05',
   '06',
   '07',
   '08',
   '09',
   '10',
   '11',
   '12',
   '13',
   '14',
   '15',
   '16',
   '17',
   '18',
   '19',
   '20',
   '21',
   '22',
   '23',
   '24',
   '25',
   '26',
   '27',
   '28',
   '29',
   '30',
   '31'],
  'experiment': 'historical',
  'format': 'zip',
  'month': ['01',
   '02',
   '03',
   '04',
   '05',
   '06',
   '07',
   '08',
   '09',
   '10',
   '11',
   '12'],
  'temporal_resolution': 'daily',
  'variable': 'precipitation',
  'year': '1972',
  'model': 'access_cm2'}]

and the request_sim[0], which is :

'projections-cmip6'

malmans2 commented 8 months ago

mmm, maybe it's because you are still using persist in your NB? Changing it to compute fixed some issues in the previous notebooks.

I started caching the raw data on the VM.

almarbo commented 8 months ago

the last version provided in my comment do not have persist. I will also update the description notebook so there's no confusion

malmans2 commented 8 months ago

The difference between you snippet and what we are doing is that we reduce the amount of data at the very beginning of the computation (see the function select_timeseries). You can get the same value if you do this:

ds_index = icclim.index(
    index_name="RX5day",
    in_files=ds.where(ds["time"].dt.season == "JJA", drop=True),
    slice_mode="JJA",
)
ds_index["RX5day"].mean("time").max().values

I don't know what RX5day does. If you expect the results to be identical, there's probably a bug in icclim. Otherwise, if you always need the whole timeseries, we should change select_timeseries.

almarbo commented 8 months ago

Well seen! But I do not really understand why the results obtained should be different. In theory when you select the slice mode JJA it should take the JJA season, so it should not matter if what you pass to the function is annual or the JJA season is already selected. In fact, for the other indices, the results are exactly the same in both cases: if you first use select_timeseries or if not.

The RX5day selects within a period (JJA) the maximum 5-day total precipitation (i.e., it should select the 5 consecutive days where the amount of precipitation within the period is the highest and accumulate it). The RX1dayselects the day within the period with the maximum amount of rain accumulated. If we have, for a particular year, a maximum value of RX1day=71.31009, the 'RX5day' should never exceed five time this value. Values of 8182.29363571736 are definitely too high, whilst values of 113.3097110723611 (obtained when we are not using select_timeseries) are reasonable. What really surprises me is that, for ERA5 the obtained values are reasonable. For Cordex the problem persists, though.

Perhaps, should we consider not using select_timeseries for this notebooks?

malmans2 commented 8 months ago

Not sure, you should probably open an issue in icclim to better understand what's going on. I'm changing that function to this:

def select_timeseries(ds, timeseries, year_start, year_stop):
    if timeseries == "annual":
        return ds.sel(time=slice(str(year_start), str(year_stop)))
    return ds.sel(time=slice(f"{year_start-1}-12", f"{year_stop}-11"))

It will affect performance, but I was able to run it on the whole timeseries of ERA5 (ERA5 is the most affected because we do the resampling on more data).

I'll send you the new template as soon as I'm done caching (I merged temperature/precipitation so you can select the variable at the very beginning)

malmans2 commented 8 months ago

Hi @almarbo,

Here is the first draft of the template (temperature + precipitation): https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/extreme_indices.ipynb Here are the results for CMIP6 precipitation: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb

I didn't have time to check the results, let me know if they look OK.

I'll cache CORDEX in the meantime.

malmans2 commented 8 months ago

Little change I forgot to implement: colormaps are now flipped for precipitation

almarbo commented 8 months ago

Hi @malmans2. Top! the results look okey for me. Could we proceed with Cordex from now on?

malmans2 commented 8 months ago

Yup it's running, but it's gonna take some time. Let's catch up after Easter.

Buon Pasqua!

malmans2 commented 8 months ago

CORDEX is also cached now: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb#file-cordex-ipynb

almarbo commented 8 months ago

Hi @malmans2

thank you so much! the results look fine to me. We would like to also cache for CORDEX DJF if possible. We can do it on Tuesday, no worries.

Buona Pasqua!

malmans2 commented 7 months ago

I launched the scripts to cache DJF over the break, but unfortunately I need to make some change to the template. The forms to download smhi_rca4 and uhoh_wrf361h are different compared to the other models. For example, we need to use start and end year 1970 to get that year rather than a 5-year window.

malmans2 commented 7 months ago

@almarbo precipitation historical is cached. Please take a look and let me know if it looks OK or there's anything to change: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb

I'll work on future next.

almarbo commented 7 months ago

hi @malmans2

Thank you so much. Do you think the current workflow would support annual data? If yes, could we switch to annual?

Apologies for so many changes, but sometimes we need to change the initial idea depending on the results.

almarbo commented 7 months ago

For the timeseries parameter, I mean

malmans2 commented 7 months ago

I'll try to cache that overnight

malmans2 commented 7 months ago

Hi @almarbo,

Unfortunately, the VM is not able to handle the current setup using CORDEX and annual timeseries. The kernel dies when computing the indexes. It might take some time to find a solution that works well on the VM. Let me know if you want me to invest time on it.

malmans2 commented 7 months ago

(I'm going to run a test overnight. The only simple solution I can think about is to cache all indexes separately)

almarbo commented 7 months ago

Hi @malmans2 , Thanks for testing it. Let me know about the test you are doing. If there is not an easy solution, no worries at all, I would sugest to go on with DJF. We would comment on the results within the final notebook if that's the case

malmans2 commented 7 months ago

Unfortunately, it didn't work. Maybe you can try on another machine if you have more resources?

almarbo commented 7 months ago

oops. It´s okay, no worries I think, for now, it is okay to proceed with DJF. If needed, I would eventually use the resulting notebook in another macchine with more resources

malmans2 commented 7 months ago

OK, the next step is to make the future notebook compatible with precipitation, right?

almarbo commented 7 months ago

Exactly!

almarbo commented 7 months ago

Hi @malmans2 any news on the future for precipitation? Do you need me to work on it?

malmans2 commented 7 months ago

caching CMIP6 right now, should be done by the end of the day. I'll cache CORDEX over the weekend. I'll be in touch.

malmans2 commented 7 months ago

Actually, good timing. CMIP6 is ready. Could you please take a look?

Over the weekend I will cache CMIP6/CORDEX + DJF/JJA + temperature/precipitation + historical/future

I'm caching again temperature as the new templates are more general. If everything is in good shape, we can deprecate the temperature-only templates.

Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/extreme_indices_future.ipynb Here is the notebook executed: https://gist.github.com/malmans2/da3767653635d24cf76e7b6e8968d896

almarbo commented 7 months ago

hi @malmans2 the results look fine, thanks a lot. Have a nice weekend

malmans2 commented 7 months ago

Hi @almarbo,

I have good and bad news.

Good news first: I found the optimal chunking to cache annual as well. I've succesfully cached CMIP6 historical, I'll try to cache CORDEX tonight. Here is CMIP6: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb

Bad news: I found some problem with CORDEX future. The data of a couple of models are corrupted. It looks like a CDS problem, but I need some time to debug it. If it's a CDS problem, we need to open a ticket. Also, I can't cache DJF from 2015 because 2014 is not available.

malmans2 commented 7 months ago

Hi @almarbo,

I found the corrupted request. You can reproduce it on you machine with this:

import cdsapi

collection_id = "projections-cordex-domains-single-levels"
request = {
    "area": [72, -22, 27, 45],
    "domain": "europe",
    "end_year": 2015,
    "ensemble_member": "r1i1p1",
    "experiment": "rcp_8_5",
    "format": "zip",
    "gcm_model": "mpi_m_mpi_esm_lr",
    "horizontal_resolution": "0_11_degree_x_0_11_degree",
    "rcm_model": "knmi_racmo22e",
    "start_year": 2011,
    "temporal_resolution": "daily_mean",
    "variable": "mean_precipitation_flux",
}

client = cdsapi.Client(debug=True)
client.retrieve(collection_id, request, "download.zip")

If you unzip and ncdump the downloaded file, you'll see that the file is corrupted. Do you think it's an issue with the CDS? If yes, we'll open a ticket.

malmans2 commented 7 months ago

I'm pretty sure something went wrong in the CDS, I'm opening the ticket.

malmans2 commented 7 months ago

mmm, actually, looks like they've fixed it now. I'm going to try again.

almarbo commented 7 months ago

Hi @malmans2 , thanks a lot for checking it all. I have ncdumped the downloaded file and I am not able to see where it is corrupted. I think I misssed something, what made you feel it is corrupt?

top news! thanks a lot, let's see with CORDEX

Good news first: I found the optimal chunking to cache annual as well. I've succesfully cached CMIP6 historical, I'll try to cache CORDEX tonight. Here is CMIP6: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb

Unfortunately we are a little bit late with the delivering of the notebooks and we should provide something ASAP. If it will take a while I think the easiest option would be just to change the corrupted models with a couple which are not corrupted.

Bad news: I found some problem with CORDEX future. The data of a couple of models are corrupted. It looks like a CDS problem, but I need some time to debug it. If it's a CDS problem, we need to open a ticket.

this is for CMIP6, right? or is it also for CORDEX? I think we can focus on CORDEX and forget about CMIP6 for precipitation. If it affects to both no worries, let's skip 2015.

Also, I can't cache DJF from 2015 because 2014 is not available.

almarbo commented 7 months ago

I missed this message while I was writting mine. Thanks!

mmm, actually, looks like they've fixed it now. I'm going to try again.

malmans2 commented 7 months ago

Actually, sorry! The data IS corrupted. I pasted the wrong snippet. See:

import cdsapi

collection_id = "projections-cordex-domains-single-levels"
request = {
    "area": [72, -22, 27, 45],
    "domain": "europe",
    "end_year": 2060,
    "ensemble_member": "r1i1p1",
    "experiment": "rcp_8_5",
    "format": "zip",
    "gcm_model": "mpi_m_mpi_esm_lr",
    "horizontal_resolution": "0_11_degree_x_0_11_degree",
    "rcm_model": "knmi_racmo22e",
    "start_year": 2056,
    "temporal_resolution": "daily_mean",
    "variable": "mean_precipitation_flux",
}

client = cdsapi.Client(debug=True)
client.retrieve(collection_id, request, "download.zip")

I'll open a ticket and cc you (if I can).

malmans2 commented 7 months ago

Can you please recap here what do you want me to cache?

almarbo commented 7 months ago

Yes. We would only need CORDEX to be cached (historical and future period). The preferred temporal aggregation is annual. If it is not possible we can perfectly go only with DJF. If the problem with the corrupted files will take a while, we would just prefer to change the two corrupted models with two of them which has no corrupted data.

malmans2 commented 7 months ago

CORDEX historical precipitation DJF is already cached. See: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb

I'm trying annual right now, although during the day it's hard to cache CORDEX because the VM is crowded.

almarbo commented 7 months ago

no worries, let's wait for it.

almarbo commented 7 months ago

By the way, could you specify which were the models that gave problems? "knmi_racmo22e" and which was the other?

malmans2 commented 7 months ago

mohc_hadrem3_ga7_05

Right now I'm caching the other models. It looks like something went wrong with the cutout in the CDS, and corrupted files have been cached. I opened the ticket, so we might been able to use those models if the CDS team clears the cache.

almarbo commented 7 months ago

okey, perfect!

malmans2 commented 7 months ago

Hi @almarbo,

I think I know how to avoid the files that are corrupted. I'll run everything again overnight. However, talking with user support I realised that the area argument in the CORDEX requests is useless. Are you aware of it? I.e., adding the area parameter to the request does not make any difference, it just download the whole domain (europe in your case).

bopen / c3s-eqc-toolbox-template