euro-cordex / py-cordex

cordex python tools
https://py-cordex.readthedocs.io
MIT License
19 stars 4 forks source link

Slicing EURO-CORDEX model outputs #195

Open Murk89 opened 9 months ago

Murk89 commented 9 months ago

Is there a way to slice/subset EURO-CORDEX model outputs using py-cordex? I am working with REMO2015 model output and have tried using xarrays to slice the output for my area of study. It doesn't work and I am guessing it is due to the rotated lat lon grid. The code I have is as follows.

A solution could be if py-cordex offers a function to clip the EURO-CORDEX outputs based on shapefiles? And would the shapefiles need to be in the rotated lat lon projection?

import xarray as xr

#Sample file of the REMO2015 files
ds=xr.open_dataset("/home/remo_temp_sample.nc")

#quick look at the ds
<xarray.Dataset>
Dimensions:                     (time: 1826, bnds: 2, rlat: 412, rlon: 424,
                                 vertices: 4)
Coordinates:
  * time                        (time) datetime64[ns] 2006-01-01T12:00:00 ......
  * rlat                        (rlat) float64 -23.38 -23.27 ... 21.72 21.84
  * rlon                        (rlon) float64 -28.38 -28.27 ... 18.04 18.16
    lat                         (rlat, rlon) float32 ...
    lon                         (rlat, rlon) float32 ...
    height                      float64 ...
Dimensions without coordinates: bnds, vertices
Data variables:
    time_bnds                   (time, bnds) datetime64[ns] ...
    rotated_latitude_longitude  int32 ...
    lat_vertices                (rlat, rlon, vertices) float32 ...
    lon_vertices                (rlat, rlon, vertices) float32 ...
    tasmax                      (time, rlat, rlon) float32 ...
Attributes: (12/34)
    institution:                    Helmholtz-Zentrum Geesthacht, Climate Ser...
    institute_id:                   GERICS
    experiment_id:                  rcp26
    source:                         GERICS-REMO2015
    model_id:                       GERICS-REMO2015
    forcing:                        N/A
    ...                             ...
    title:                          GERICS-REMO2015 model output prepared for...
    parent_experiment:              N/A
    modeling_realm:                 atmos
    realization:                    1
    cmor_version:                   2.9.1
    tracking_id:                    hdl:21.14103/e7ee14a3-6fb4-4f3e-8e10-fe4b...

#I need to slice the EURO-CORDEX domain to my study area with the following lat and lon
min_lon = 24.344
min_lat = 43.609
max_lon = 25.739
max_lat = 45.672

sliced_ds = ds.sel (rlat=slice(min_lat,max_lat), rlon=slice(min_lon,max_lon)) 

#this results in an empty dataset
<xarray.Dataset>
Dimensions:                     (time: 1826, bnds: 2, rlat: 0, rlon: 0,vertices: 4)
larsbuntemeyer commented 9 months ago

Hi @Murk89, thanks for the question. Since you are referring to shapefiles, i guess you want to mask a certain region for analysis, right? That's not really the scope of py-cordex since there are other great packages do achieve this. However, since this a popular task with regional climate model datasets, there is actually an example in the docs here.

The example is based on regionmask. If you want to use shapefiles, i can recommend the excellent documentation on how to work with geopandas and regionmask.

I hope that helps!

Murk89 commented 9 months ago

Hi @larsbuntemeyer Thankyou for the suggestions. I want to subset/slice the EURO-CORDEX datasets to my study regions. As in the code provided above, I have tried it with xarrays but it doesn't work due to the rotated grid. I was hoping the py-cordex would have some function to perform the slicing?

I have previously tried to clip the REMO2015 dataset to my study area using a shapefile and using geopandas. For that to work I need to change the projection of the shapefile to the EURO-CORDEX rotated latitude longitude. That didn't work because cartopy doesn't recognize the wkt of EURO-CORDEX grid. I think that might be a separate issue on it's own.

larsbuntemeyer commented 9 months ago

I want to subset/slice the EURO-CORDEX datasets to my study regions. As in the code provided above, I have tried it with xarrays but it doesn't work due to the rotated grid.

Yes, the coordinates of the dataset (rlon/rlat) are defined in the rotated coordinate reference system (CRS), but since you have the projected lon/lat coordinates in the dataset, you can use them for masking with where, e.g., you could do something like:

ds.where((ds.lon > min_lon) & (ds.lon < max_lon) & (ds.lat > min_lat) & (ds.lat < max_lat))

and using an example:

import cordex as cx

min_lon = 24.344
min_lat = 43.609
max_lon = 25.739
max_lat = 45.672

ds = cx.tutorial.open_dataset("tas_EUR-11_ECMWF-ERAINT_evaluation_r1i1p1_GERICS-REMO2015_v1_mon_197902-198012")
subset = ds.where((ds.lon > min_lon) & (ds.lon < max_lon) & (ds.lat > min_lat) & (ds.lat < max_lat), drop=True)
subset.tas.isel(time=0).plot()

grafik or without drop:

subset = ds.where((ds.lon > min_lon) & (ds.lon < max_lon) & (ds.lat > min_lat) & (ds.lat < max_lat))
subset.tas.isel(time=0).plot()

grafik That should look like the region you are interested...

larsbuntemeyer commented 9 months ago

If you want to plot CORDEX datasets in their native rotated coordinates, i did some example for this in a notebook some time ago. You should also check out the cartopy documentation on that topic which really helped me to understand it!

Murk89 commented 9 months ago

Thanks for the code. The .where method is a good idea. I was also able to do something similar with python-cdo.

cdo.sellonlatbox (24.344,25.739,43.609,45.672, input='path_to_file', output=p'path_to_file')

So looking at the notebook you mention, it is not changing the crs of the actual dataset ? It is for plotting purposes only, right?

larsbuntemeyer commented 9 months ago

So looking at the notebook you mention, it is not changing the crs of the actual dataset ? It is for plotting purposes only, right?

Exactly! cartopy only needs to know the CRS of your data (the transform keyword). It then does the projection itself using pyproj. For xarray and cdo, you need the projected coordinates in the dataset, e.g, the lon/lat coordinates if you want to work with those coordinates (e.g. masking, etc...). The grid_mapping, e.g., the rotated_latitude_longitude attributes tell you how the transformation from rlon/rlat to lon/lat is done. You can also check the CF conventions on this topic.

kzarate commented 4 months ago

Hi, I’m starting to work with regional climate models, and I would like to join this thread because my problem is related to this topic. I was trying to follow the steps in this tutorial to obtain a weighted mean from a euro cordex NetCDF file by using a shapefile. I understand the concept, but I’m stuck at the moment of calculating it. https://py-cordex.readthedocs.io/en/stable/prudence.html

This is the xarray Dataset:

<xarray.Dataset> Size: 1GB
Dimensions:       (time: 1826, rlon: 424, rlat: 412, bnds: 2)
Coordinates:
  * time          (time) datetime64[ns] 15kB 2001-01-01T12:00:00 ... 2005-12-...
  * rlon          (rlon) float64 3kB -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * rlat          (rlat) float64 3kB -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
    lat           (rlat, rlon) float64 1MB dask.array<chunksize=(412, 424), meta=np.ndarray>
    lon           (rlat, rlon) float64 1MB dask.array<chunksize=(412, 424), meta=np.ndarray>
    height        float64 8B ...
Dimensions without coordinates: bnds
Data variables:
    rotated_pole  |S1 1B ...
    huss          (time, rlat, rlon) float32 1GB dask.array<chunksize=(1, 412, 424), meta=np.ndarray>
    time_bnds     (time, bnds) datetime64[ns] 29kB dask.array<chunksize=(1, 2), meta=np.ndarray>
Attributes: (12/26)
    Conventions:                    CF-1.4
    conventionsURL:                 http://www.cfconventions.org
    title:                          CLMcom-ETH-COSMO-crCLIM-v1-1 model output...
    project_id:                     CORDEX
    driving_model_id:               MPI-M-MPI-ESM-LR
    driving_experiment_name:        historical
    ...                             ...
    source:                         Climate Limited-area Modelling Community ...
    references:                     http://cordex.clm-community.eu/
    product:                        output
    frequency:                      day
    creation_date:                  2019-10-01 20:53:16
    tracking_id:                    hdl:21.14103/46dbcfe6-3068-4329-a0c3-4784...

and these are my lines:

Import geopandas as gp
Import xarray as xr
Import numpy as np
DATADIR = './'
xr_database = xr.open_mfdataset(f'{DATADIR}*huss_EUR*.nc')
shapefile = gp.read_file("rur_boundary.shp")  
shapefile_regions = regionmask.from_geopandas(shapefile)  

mask = shapefile_regions.mask_3D(xr_database.lon, xr_database.lat)
weights = np.cos(np.deg2rad(xr_database.rlat))
regional = xr_database.weighted(mask_cordex * weights).mean(dim={'rlat', 'rlon'})

The problem is on the very last line. This is the message that I receive:

in einsum  return c_einsum(*operands, **kwargs)
TypeError: invalid data type for einsum

Any comment or suggestion would be very much appreciated. Many thanks in advance.

Karla

larsbuntemeyer commented 4 months ago

hi @kzarate and thanks for joining! Could you please give some more details about your dataset (maybe one example file that you downloaded from ESGF?) and maybe where to find your shapefile. A first guess might be that it fails because you make a mean on the whole dataset and that might fail for variables like the rotated_pole or time_bnds because they have no spatial dimentions. You coud probably try:

xr_database.huss.weighted(mask_cordex * weights).mean(dim={'rlat', 'rlon'}) # weighted mean only on huss
kzarate commented 4 months ago

Hi @larsbuntemeyer thanks for your reply! That was the missing part for getting the numbers, I could get the plot and everything is fine. I read on a regionmask tutorial that for climate models it is recommended to derive weighted averages from the original cell area. I checked that that's the new feature of the py-cordex package and I wanted to try it and compare numbers, so I started again with the Working with Prudence regions tutorial:

import cartopy.crs as ccrs
import intake
import matplotlib.pyplot as plt
import numpy as np
import regionmask
import xarray as xr
from tqdm.autonotebook import tqdm

import cartopy.feature as cf

import cordex as cx

xr.set_options(keep_attrs=True)

def plot(
    da,
    transform=ccrs.PlateCarree(),
    projection=ccrs.PlateCarree(),
    vmin=None,
    vmax=None,
    borders=True,
    xlocs=range(-180, 180, 2),
    ylocs=range(-90, 90, 2),
    extent=None,
    figsize=(15, 10),
    title="",
):
    """plot a domain using the right projections and transformations with cartopy"""

    plt.figure(figsize=figsize)
    ax = plt.axes(projection=projection)
    if extent:
        # ax.set_extent([ds_sub.rlon.min(), ds_sub.rlon.max(), ds_sub.rlat.min(), ds_sub.rlat.max()], crs=transform)
        ax.set_extent(extent, crs=projection)
    ax.gridlines(
        draw_labels=True, linewidth=0.5, color="gray", xlocs=xlocs, ylocs=ylocs
    )
    da.plot(ax=ax, cmap="terrain", transform=transform, vmin=vmin, vmax=vmax)
    ax.coastlines(resolution="50m", color="black", linewidth=1)
    if borders:
        ax.add_feature(cf.BORDERS)
    if title:
        ax.set_title("")

prudence = regionmask.defined_regions.prudence
print(prudence)

plt.figure(figsize=(8, 8))
proj = ccrs.LambertConformal(central_longitude=15)
ax = prudence.plot(add_ocean=True, projection=proj, resolution="50m", label="abbrev")
# plt.show()

eur11 = cx.cordex_domain("EUR-11", dummy="topo")
pole = (
    eur11.rotated_latitude_longitude.grid_north_pole_longitude,
    eur11.rotated_latitude_longitude.grid_north_pole_latitude,
)

mask = prudence.mask_3D(eur11.lon, eur11.lat)
print(mask.region)

and I got this message:

from .utils import cell_area, get_tempfile
ImportError: cannot import name 'cell_area' from 'cordex.utils'

My attempt to solve this was to reinstall it by conda but I still got the same result. Do you have any idea or suggestion?

larsbuntemeyer commented 4 months ago

@kzarate glad to hear you could do the plotting! I don't really see the usage of cell_area in your code snippet! Could you check if something like this works?

import cordex as cx
cx.cordex_domain("EUR-11", cell_area=True)

Please also ensure you have the latest version installed:

import cordex as cx
cx.__version__