DRAFT: Wind data from COSMO REA6 reanalysis

Change proposed in this Pull Request

Once ready, this pull requests would add the possibility to include wind data from the COSMO REA6 reanalysis into atlite. This is a first version which I made for a specific use-case. It is already working for my needs (i.e. on a small spatial domain in central Europe) but is neither robust nor tested exhaustively. If you think this adds value to atlite and would be good to be included, I can try to continue working on it and make it more robust, make the regridding faster, add tests etc.

Description

A new module cosmo_rea6 is added to atlite.datasets. The module can download data from the opendata.dwd server and restructures the data in order to fit into the atlite framework and conventions. Only the wind feature is supported.

A dedicated directory to store the downloaded raw netcdfs was chosen instead of tempdir approach as in the era5 module. This allows users to download the data however they want and then reuse it in atlite without letting atlite handle the download.

Issues, Questions and Todos

As there are many height levels available from the dwd server, it would be nice if the user could select only needed height levels and not load the complete wind feature with all levels. I tried to intruduce a creation_paramter cosmo_rea6_height_levels but this is not working due to the way missing (sub)features are searched in data.cutout_prepare(). Is there an easy way of selecting subfeatures for a Cutout? Another way would be to preselect avaliable height levels in the features list of the comso_rea6 module which would then not be available to the users (what I did now for testing purposes).
The COSMO data is provided on a curvilinear grid with rotated pole. For the wind speed on https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/converted data, 2D coordinate fields are already provided in WGS84. In order to work with atlite (compatibility with other datasets, layout capacity mapping), regridding to a rectilinear grid is necessary. As far as I understood, this is something different as done in atlite.gis.regrid and unfortunately on I didn't find an easy way for doing it on a windows platform (see https://github.com/pydata/xarray/issues/2281) while it would have been easily possible on linux (e.g. with xESMF https://xesmf.readthedocs.io/en/latest/notebooks/Curvilinear_grid.html). Therefore, I implemented a hacky solution with scipy.griddata which is terribly slow for now. I guess this should be sorted out before merging, atleast xr.apply_ufunc should be used somehow.
Right now, no checks are done whether the cutout extent is within the COSMO REA6 domain. I need to check if the regridding is working when the domain is larger than the COSMO domain.
I did not test whether it is easily possible to combine the cosmo_rea6 module with other data modules such as era5 and sarah.

Motivation and Context

The COSMO REA6 reanalysis is a regional reanalysis product for Europe. It has a spatial resolution of ~6km and is able to represent local wind speeds significantly better than e.g. ERA5 (see Kaspar et al. 2020, https://doi.org/10.5194/asr-17-115-2020 for a review). However, Urraca et al. (2018, https://doi.org/10.1016/j.solener.2018.02.059) conclude that satellite based products such as SARAH are performing better than COSMO REA6 for radiation. Therefore, it probably only makes sense to include wind from COSMO REA6 into atlite which is what this PR does.

On the opendata.dwd.de server, wind speed at 10, 40, 60, 80, 100, 125, 150, 175 and 200 m height above ground is available at daily and hourly resolution (e.g. https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/converted/hourly/2D/).

How Has This Been Tested?

I only created a dedicated wind cutout for 2012 in Germany and reproduced this example from the documenation. Below you can see the same plots as in the example that compare the different data sources. time_series_DE_2012 time_series_DE_2012-04 scatter_DE_2012 duration_curves_DE_2012

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist

[ ] I tested my contribution locally and it seems to work fine.
[ ] I locally ran pytest inside the repository and no unexpected problems came up.
[x] I have adjusted the docstrings in the code appropriately.
[ ] I have documented the effects of my code changes in the documentation doc/.
[ ] I have added newly introduced dependencies to environment.yaml file.
[ ] I have added a note to release notes doc/release_notes.rst.
[x] I have used pre-commit run --all to lint/format/check my contribution

Cool, I like it. Someone from the team will have a more detailed look soon.

Very interesting PR, thank you for the suggestions!

From Kaspar et al. adding figures here to help with the discussion:

coverage

wind speed correlation with measured data at 100m

"Wind data from COSMO-REA6 is also used in the context of the “Site Development Plan 2019 for the German North Sea and Baltic Sea” (BSH, 2020a (German), b (English); Tinz et al., 2019). The Site Development Plan determines ar- eas for new installations of offshore wind turbines within the German Exclusive Economic Zone of the North and Baltic Sea (see Fig. 3) for the period 2026 until at least 2030 (on the basis of the German Wind Energy at Sea Act)."
They report on a number of studies which have already used the dataset for synthetic wind feed-in evaluations

Some thoughts:

Limited regional scope, but better data compared to ERA5
Potential alternative to considered GWA-renormed ERA5 approach we discussed earlier for better quality wind data
Better wind data for offshore / near on- & off-shore
Potentially very interesting for PyPSA-EUR

My gut feeling is it would take 1-2 weeks for including it properly into atlite (compatibility with others modules, performance, tests).

Hi @euronion, thanks for sharing your thoughts. What do you think the next step should be?

Should I start trying to improve the implementation or should we wait for more comments from the team?

Hi @joAschauer ,

We had another chat internally yesterday. We think COSMO REA6 would be a great addition to atlite.

One last question on the impacts of adapting the dataset Since you already have the data, could you create a capacity factor map showing the differences in offshore capacity factor between ERA5 and COSMO REA6? E.g. for Germany, can also be onland + offshore; or whichever coverage you have already prepared as cutout and ready at hands.

Otherwise, big YES: You're very much welcome to improve the implementation. If there is something we can help you with do let us know :).

Regarding your earlier questions/TODOs

As there are many height levels available from the dwd server, it would be nice if the user could select only needed height levels and not load the complete wind feature with all levels.

I don't think there is much added value from this. We should focus on getting the wind speeds at enough heights to have usable interpolation to any hub height the user might request. Bascially once created, the user should not have to worry about redownloading the wind speeds at yet another height just because they didn't download them earlier.

This gives rise to another question: How to handle interpolation to a specific hub height? For ERA5 we (for now) use the roughness also retrieved. Does COSMO REA6 provide a similar indicator? Else we would either have to retrieve the roughness from ERA5 or consider implementing a different interpolation method (which might be reasonable, given the high number of steps at which wind speeds are available with COSMO REA6).

The COSMO data is provided on a curvilinear grid with rotated pole. [...] In order to work with atlite (compatibility with other datasets, layout capacity mapping), regridding to a rectilinear grid is necessary

Maybe @FabianHofmann has an idea how to do this quick and practically?

Right now, no checks are done whether the cutout extent is within the COSMO REA6 domain. I need to check if the regridding is working when the domain is larger than the COSMO domain.

Requesting a cutout with bounds exceeding the COSMO REA6 coverage should simply throw an error and not be allowed.

I did not test whether it is easily possible to combine the cosmo_rea6 module with other data modules such as era5 and sarah.

We can figure that out together. As long as the features the data module provides do not overlap with the other modules, it should be quite easy to integrate them with each other.

Looking at the COSMO-REA6 website I see a number of other data also available, not only wind data. Would there be some merit in making the data module directly compatible with those as well? Or are those values only available with restrictions and a not-so-straightforward-implementation?

Hi @euronion,

thanks for your reply.

As suggested, I have made capacity factor maps for Germany with the Vestas_V90_3MW turbine in 2012. I had already prepared a Cutout with the 125m and 175m wind height levels in that area. You can see the maps below and you can get the notebook I used to compare the two datasets from this gist (please note that this notebook may not run with the current branch HEAD 4d089bd45bc0c7012faa10a2ba2fa66627c4af16 since there, the 175m height level is not available in the wind feature). Interesting to note is the reduced capacity factor in the North Sea further offshore for the COSMO data. This could be due to missing data in that area for the assimilation of the reanalysis.

capacity_factor_map_DE_2012

capacity_factor_map_northern_DE_2012

To address your points from above:

We should focus on getting the wind speeds at enough heights to have usable interpolation to any hub height the user might request.

Then we need to decide which heights to include by default.

How to handle interpolation to a specific hub height?

Roughness is also available for COSMO REA6. I have already included it in the "wind" feature of this PR. So the interpolation between height levels could remain the same. In any case, if we decide to use a different interpolation method, I think this should be done in a separate PR.

Requesting a cutout with bounds exceeding the COSMO REA6 coverage should simply throw an error and not be allowed.

Okay, this sounds reasonable.

As long as the features the data module provides do not overlap with the other modules, it should be quite easy to integrate them with each other.

This is not clear to me. What do you mean by "do not overlap"?

Looking at the COSMO-REA6 website I see a number of other data also available, not only wind data.

You are right, there are other variables available. See https://opendata.dwd.de/climate_environment/REA/Readme_intro_REA-OD.pdf for a general description of what is available on the opendata.dwd.de server. A detailed list of parameters can be found at https://opendata.dwd.de/climate_environment/REA/ParameterTabellen.pdf. The variables air density, u- and v-component, wind direction and wind speed are available in a converted netcdf format at https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/converted/ in daily, hourly and monthly resolution:

air density (DENS_* with * = 040, 060, 080, 100, 125, 150, 175, 200)
u component wind (U_100)
v component wind (V_100)
wind direction (WD_* with * = 010, 040, 060, 080, 100, 125, 150, 175, 200)
wind speed (WS_* with * = 010, 040, 060, 080, 100, 125, 150, 175, 200)

This converted format is what I have been using so far. The other variables are available in grib format and I didn't read them in and don't know if the regridding gets more complicated, but I guess so. On top of the curvilinear grid present in the "converted" netcdfs, there will be a rotated pole coordinated system in the grib files. However, if you think this would be a big advantage, I can take a look at it. I think it should be possible without relying on non-Python tools like cdo.

What variables do you think are of particular interest for atlite?

I'm amazed by the resolution improvement, thanks @joAschauer ! :)

Differences in CFs Is this a known problem with the COSMO data? Else it could also be linked to interpolation and different roughness values: ERA5 is interpolating from 100m to 90m for the Vestas model you choose, while you are interpolating from 125m to 90m in an area in proximity to Heligoland. But certainly something to look out for.

We also see the higher CFs with COSMO in coastal and mountainous regions, a limitation ERA5 is known for which we could eliminate with the new data. Sweet.

Heights to include by default Let's go with:

90m
125m
150m
200m

These will capture most current and future turbine heights offshore and onshore.

Having roughness from COSMO is even better, compatibility with the existing method makes me happy. Other interpolation methods are still something we need to look into, but that's a different issue and yes, would be a separate PR.

As long as the features the data module provides do not overlap with the other modules, it should be quite easy to integrate them with each other.

This is not clear to me. What do you mean by "do not overlap"?

Each dataset provides one or more features, for COSMO REA6 those are at the moment

features = {
    "wind": [
        # "wnd10m",
        # "wnd40m",
        # "wnd60m",
        # "wnd80m",
        # "wnd100m",
        "wnd125m",
        # "wnd150m",
        # "wnd175m",
        # "wnd200m",
        "roughness",
    ],
}
static_features = {}

The same feature wind is also provided by the ERA5 dataset. ERA5 (and other datasets) provide other additional features like height or influx. As long as the wind feature implementation of COSMO REA6 doesn't contain any variables, which the features of other datasets implement, they should all be compatible with each other.

Speaking of features and other variables: Can you also implement wnd_azimuth for the wind feature as done in the ERA5 module? That would make them fully compatible replacements for each other: https://github.com/PyPSA/atlite/blob/549d0fdf22925e06ac3f726e801f215d993671dc/atlite/datasets/era5.py#L125-L127

You are right, there are other variables available. See https://opendata.dwd.de/climate_environment/REA/Readme_intro_REA-OD.pdf for a general description of what is available on the opendata.dwd.de server. A detailed list of parameters can be found at https://opendata.dwd.de/climate_environment

Thank you for those tables. I think in general all the variables which would substitute ERA5 for Europe (-> higher resolution) would be beneficial, i.e. these features from the ERA5 dataset: https://github.com/PyPSA/atlite/blob/549d0fdf22925e06ac3f726e801f215d993671dc/atlite/datasets/era5.py#L45-L60

I would have to dig a bit into the COSMO dataset to understand how they relate to the variables offered. T_2M and T_SOIL for the temperature feature in ERA5 should be directly transferrable.

But I don't consider the other features as important as wind for now. They are less dependent on local variations to roughness and for solar radiation we also support SARAH datasets which have generally a better quality. If you want to have a look feel free, but I would suggest we go with the wind feature only first.

Hi @joAschuaer, fwiw, a couple of plots made with home-made code --

6dec-cap-yearday-49-11-150m-2014-E-141-4200 31% cap factor looks waaay high. Are there any spot checks on COSMO_REA6 windspeeds at 150m ❓

13aug-cosmorea6-avwind-BY-WS_150m 2D 2014

13aug-wind-cap-days-150m-2014

I don't think there's much overlap with your project; this is version 0 (1 user), sprawling, quite a ways from a package. Fwiw 1) start small: code a bit, doc a bit, plot a bit, what do you really want ? 2) stick to numpy. File trees of .npz files < 1 GB are simple, fast, visible. Clip big .nc -> .npz to a smaller area soon as you can. For example data/DE-grid05/WS_150m.2D.2014.npz is 970M, np.load takes 1 sec on my old imac. 3) regrid lat lon with Kdtrees but don't interpolate the timeseries

Anyway some notes are under https://gist.github.com/denis-bz

Added 16 Sept, a couple of nasties in opendata.dwd.de WS files -> xarray that you probably know about: 1) they're compressed a bit, 2 GB to 1.5, but that that slows down xarray writing by a lot, gzip; Uncompress them right after download, and set encoding complevel = 0. 2) transpose so that timeseries are contiguous in memory in C order 3) times are 1 off, should be e.g. 2018-01-01T00 .. 2018-01-31T23

Also, to compare cosmo and era5, a good testcase is January 2018 with 100kmh storms across Europe.

comments welcome cheers -- denis

PyPSA / atlite