bopen / c3s-eqc-toolbox-template

CADS Toolbox template application
Apache License 2.0
5 stars 4 forks source link

CORDEX notebooks #54

Closed lsangelantoni closed 1 year ago

lsangelantoni commented 1 year ago

Notebook description

This directory includes:

for an ensemble of regional climate models from Euro-CORDEX.

cordex_notebooks.zip

Notebook link or upload

No response

Anything else we need to know?

Notwithstanding some structural differences between CMIP6 and CORDEX outputs, It would be useful to homogenize as much as possible CMIP6 and CORDEX notebooks in terms of functions and workflow.

Environment

malmans2 commented 1 year ago

Hi @lsangelantoni,

At first glance some of them look very similar to the original notebooks submitted by @paololanteri. We already reviewed those notebooks, adding ad-hoc optimised functions to our software and a few wp4 templates (see https://github.com/bopen/c3s-eqc-toolbox-template/tree/main/notebooks/wp4).

Are you aware of those templates? If yes, have you tried aligning your notebook with the wp4 templates available?

Notwithstanding some structural differences between CMIP6 and CORDEX outputs, It would be useful to homogenize as much as possible CMIP6 and CORDEX notebooks in terms of functions and workflow.

Do you mean that the wp4 templates/functions currently available do not work well with CORDEX?

malmans2 commented 1 year ago

The notebooks also import custom functions that are not included in the zip file (functions_tmp).

(Those also look very similar to Paolo's original functions. None of the templates use functions from functions_tmp as we implemented them in c3s_eqc_automatic_quality_control whenever possible).

malmans2 commented 1 year ago

Another note after reading some of the notes in the notebooks.

Looks like you'd like to show the same statistics in the CMIP6 templates, but without applying weights. At the moment you should be able to explicitly pass weights to all our functions starting with spatial_weighted:

diagnostics.spatial_weighted_statistics(obj, weights=xr.DataArray(1))

This is probably a little confusing and not optimal, but allows to use the same functions for CMIP6 (weights=None) and CORDEX (weights=xr.DataArray(1)).

We can also easily add unweighted functions if that's useful, something like diagnostics.spatial_statistics(obj). Or we can add the option to pass bool: diagnostics.spatial_weighted_statistics(obj, weights=False)

lsangelantoni commented 1 year ago

Yes, I am aware of the CMIP6-based templates, it would be in any case useful to have the "unweighted" version of diagnostics.py functions.

malmans2 commented 1 year ago

OK, I'll add the unweighted functions tomorrow.

lsangelantoni commented 1 year ago

Thanks

malmans2 commented 1 year ago

Hi @lsangelantoni, It's now possible to compute spatial unweighted statistics/errors. I changed a little the diagnostics available rather than adding more diagnostics (easier to maintain as we can have fewer but more general functions).

All spatial_weighted functions now accept a weights argument, which can be a boolean or a DataArray. The docstring should be self explaining, I'm pasting here just one function as example:

from c3s_eqc_automatic_quality_control import diagnostics
help(diagnostics.spatial_weighted_errors)
spatial_weighted_errors(obj1: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset, obj2: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset, lon_name: collections.abc.Hashable | None = None, lat_name: collections.abc.Hashable | None = None, weights: xarray.core.dataarray.DataArray | bool = True, **kwargs: Any) -> xarray.core.dataset.Dataset | xarray.core.dataarray.DataArray
    Calculate rmse, crmse, and correlation with latitude weighting.

    Parameters
    ----------
    obj1, obj2: Dataset or DataArray
        Input data
    lon_name, lat_name: str, optional
        Name of longitude/latitude coordinate
    weights: DataArray, bool, default: True
        Weights to apply:
        - True: weights are the cosine of the latitude
        - False: unweighted
        - DataArray: custom weights

    Returns
    -------
    DataArray or Dataset
        Object with rmse, crmse, and correlation

When weights=False, the computation is more efficient than before because no weight is actually applied (before we had to apply dummy weights, such as weights=xr.DataArray(1).

I've already installed the latest version on WP4.

malmans2 commented 1 year ago

@lsangelantoni, I've uploaded another notebook to test the cmip6 precipitation notebook with cordex. I've made it more general, so the same notebook can be usde both for CMIP6 and CORDEX. Please test it and let me know if everything is working as expected (I only tested the small time range in the template). if everything works OK, we can deprecate the CMIP6-only notebook (cc: @paololanteri)

Here is the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb

malmans2 commented 1 year ago

Also, I've seen in the notebook a comment about taylor diagram colors. I've discussed it with Paolo in another thread, I believe it's a bug in the library used. I've opened a PR to fix it upstream, but no one got back to me. The discussion with Paolo is here: https://github.com/bopen/c3s-eqc-toolbox-template/issues/23#issuecomment-1497513623

lsangelantoni commented 1 year ago

Okay, then I will consider this last one. Thanks.

For the Taylor diagram I fixed it manually defining parameters for each model we want to include in the plot e.g., Schermata del 2023-05-12 15-38-47 and then using this variable as an argument of the skill_metrics taylor_diagram function Schermata del 2023-05-12 15-39-07

malmans2 commented 1 year ago

Cool, we should be able to use our own color logic with that parameter.

FYI, I've pushed a couple of fixes to the notebook in main:

  1. We need to parametrize the parameter periodic for the interpolation (True for cmip6, False for cordex)
  2. I'm not using the CDS area parameter anymore, but cutting out after downloading so we can re-use the same raw data without downloading every time
  3. A few minor fixes to the plots, they're a bit more generic as cmip6/cordex results can be quite different

Have a good weekend!

lsangelantoni commented 1 year ago

Okay super, I am running the new clima_and_bias_pr_cordex_cmip6_regionalised.ipynb, I can confirm that it is working nicely over the shorter period. I am now running it over the entire historical period and I wouldn't expect differences... I have just added the @paololanteri function to adjust the colorbar range when we plot biases. I will send you the update version including this very minor change.

Have a nice weekend

lsangelantoni commented 1 year ago

I confirm that everything is working also the entire historical period. Below is the same script including the function to adjust colorbar range. clima_and_bias_pr_cordex_cmip6_regionalised_cbar_adj.zip We could leverage this notebook structure also for the trends computation.

malmans2 commented 1 year ago

Although it's not meant to be public, we ca use of of xarray plotting utils for that: xr.plot.utils._determine_cmap_params. It accepts all plotting kwargs, and returns a dict to make consistent plots.

I introduced it in the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb

lsangelantoni commented 1 year ago

For defining common CMIP6-CORDEX trends computation notebook, does it make sense to download datasets as performed in clima_and_bias_pr_cordex_cmip6_regionalised but outputting ds_sim variable with four dimensions (model,year,latitude,longitude) instead of three (model,latitude,longitude)? This just modifying ds = diagnostics.time_weighted_mean(ds) to ds = diagnostics.annual_weighted_mean(ds) in resample_and_regrid_and_rescale function. Then we could run spatial_weighted_trends function considering as input the above-mentioned ds_sim(?). Alternatively, we could be starting from and adjust the trend_pr_cmip6 template to be used with CORDEX simulations.

malmans2 commented 1 year ago

Sure, I think it's actually better because we can transform each chunk separately, and we can re-use cached transformed chunks when the time-period changes (i.e., we don't need to use transform_chunks=False). The only caveat is that we can not use chunks smaller than 1 year (I've added an assert to prevent this to happen).

I've updated the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb

lsangelantoni commented 1 year ago

Great, thanks. "The only caveat is that we can not use chunks smaller than 1 year" No worries at all, we'll never go through analyses requiring such shorter time chunks. The idea behind this question is to have for at least climatology-biases and trends (historical and future) user cases a consistent CMIP6/CORDEX workflow in order to ease comparison between GCM and RCM ensembles over specific regions of the globe. This aims at making end users aware of the "strengths and weaknesses" of both GCMs and RCMs in relation to the particular domain/case/dynamic at hand.

malmans2 commented 1 year ago

However, if you want to re-use in trend_pr_cmip6.ipynb the transform_func cached in clima_and_bias_pr_cordex_cmip6_regionalised, that's not too easy for a couple of reasons:

  1. The trends are computed using the native grid, so the regridding is performed after computing the trends
  2. The trend computation can take quite long for high-res grids. We need to cache that computation in trend_pr_cmip6
lsangelantoni commented 1 year ago

"The trends are computed using the native grid, so the regridding is performed after computing the trends" --> Yes this is actually the approach I would prefer in order to compute trend statistics in RCM native grid. "The trend computation can take quite long for high-res grids." Yes definitely, this is really time-consuming considering the 12 km resolution runs over the entire Euro-CORDEX domain. This was at least with the pymannkendall original_test function. I have not tried the xarray Mann Kendall function yet.

In any case, for me is totally fine to cache the trend computation as performed in trend_pr_cmip6 and in general I am open to any paths you deem more efficient to perform trend computation.

malmans2 commented 1 year ago

OK, I think at the moment we have the most efficient solution.

I realised we have an open issue regarding the statistics in the KDE cell in clima_and_bias_pr_cordex_cmip6_regionalised:

  1. I forgot to add the weights arguments to the statistics computed and showed in the table on top of the KDE plot. Do you want to show unweighted statistics for CORDEX?
  2. In the original notebook, there was a comment by @paololanteri:
    weights=weights_flatten to seaborn.kdeplot should give the weighted version of kde. 
    But it does not work!! It worked with the previous version of seaborn

    Because of that, at the moment we are not even using seaborn as we can just use pandas/matplotlib. Let me know if this is a technical issue and you need help from me to fix it, and if what is done at the moment in the notebook is correct and we can remove the TODO comment in the KDE cell.

lsangelantoni commented 1 year ago

"1. Do you want to show unweighted statistics for CORDEX?" Yes, thanks. Regarding seaborn.kdeplot issue I am not aware of its status, but I believe we can stick with pandas/matplotlib procedure. The only important aspect is to have weighted statistics for the CMIP6 GCMs and unweighted statistics for the CORDEX RCMs.

malmans2 commented 1 year ago

"1. Do you want to show unweighted statistics for CORDEX?" Yes, thanks.

Done!

@paololanteri, when you get a chance please test the CMIP6 version of clima_and_bias_pr_cordex_cmip6_regionalised and let me know if we can get rid of clima_and_bias_pr_cmip6_regionalised

paololanteri commented 1 year ago

Hi @malmans2, I tried clima_and_bias_pr_cordex_cmip6_regionalised. Everything works except that the function utils.regionalize() do not select the correct region. For example, with the default boundaries (around Mediterranean) the precipitation field I get is the one that should be located around longitude 180, both for ERA5 and models. Screen Shot 2023-05-16 at 11 45 56

For what concern the kde plot for CMIP6 moldels bias, I confirm that it is still constructed using an unweighted distribution of data, so it is not coherent with the weighted statistics that are calculated and shown near the plot. I think that the comment I made is still valid.

weights=weights_flatten to seaborn.kdeplot should give the weighted version of kde. 
But it does not work!! It worked with the previous version of seaborn

Do you have any ideas on how to solve this problem? Thank you! Paolo

malmans2 commented 1 year ago

Everything works except that the function utils.regionalize() do not select the correct region.

Good catch, thanks! There was a bug in utils.regionalize() (the longitude was getting sorted, but not the variables).

Do you have any ideas on how to solve this problem?

I'm on it, I'll let you know if I can figure it out!

malmans2 commented 1 year ago

Hi @paololanteri, Could you please check whether this is working?

# Create dataframe
da = ds_bias["precipitation"]
df_stats = diagnostics.spatial_weighted_statistics(da, weights=weights).to_pandas()

# Plot
fig, ax = plt.subplots(1, 1)
x = np.linspace(
    df_stats["ensemble"]["mean"] - 3 * df_stats["ensemble"]["std"],
    df_stats["ensemble"]["mean"] + 3 * df_stats["ensemble"]["std"],
    1_000,
)
for model, values in da.groupby("model"):
    values = values.stack(dim=values.dims).dropna("dim")
    y = scipy.stats.gaussian_kde(
        values,
        weights=np.cos(np.deg2rad(values["latitude"])) if weights else None,
    ).evaluate(x)
    plot_kwargs = {"color": "k", "ls": "--"} if model == "ensemble" else {}
    ax.plot(x, y, label=model, **plot_kwargs)
ax.grid()
ax.set_xlim(x[[0, -1]])
ax.set_xlabel(f"{da.attrs['long_name']} [{da.attrs['units']}]")
ax.legend()

# Add stats
table = plt.table(
    cellText=df_stats.round(5).T.values.tolist(),
    colLabels=df_stats.T.columns.values.tolist(),
    rowLabels=df_stats.T.index.values.tolist(),
    loc="top",
)
paololanteri commented 1 year ago

Using global coverage

lon_slice = slice(-180, 180) 
lat_slice = slice(-90, 90)

i get an error. Does it occur also to you?

In the meanwhile I tried to make a single notebook for CMIP and CORDEX also for the notebook of trend. I called it trend_pr_cordex_cmip6.ipynb It seems working well for timeseries, but I'm not able to run it for cordex trend maps. Can you give a look to this please?

trend_pr_cordex_cmip6.zip

Thank you

malmans2 commented 1 year ago

Interesting issue due to precision. Here is a MRE:

import numpy as np
np.cos(np.deg2rad(np.asarray(90, float)))  # 6.123233995736766e-17
np.cos(np.deg2rad(np.asarray(90, "float32")))  # -4.371139e-08

I'll give it some thoughts.

malmans2 commented 1 year ago

I think it's OK to just use np.abs. I've updated both the notebook, and for consistency the weights used by c3s_eqc_automatic_quality_control (we didn't catch it before because xarray allows negative weights).

I'll take a look to the trend notebook tomorrow.

malmans2 commented 1 year ago

Hi @paololanteri, It's still very much work in progress, but here is the notebook for trends cordex+cmip6: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/WIP_trend_pr_cordex_cmip6.ipynb

It does something, but I'm not sure it's correct + I need to tune a bit the plots. It's just a starting point, I'll work on this in the next couple of days.

BTW, I realized it was needed for trends, so now utils.regionalise should also work with curvilinear grids.

lsangelantoni commented 1 year ago

Hi @malmans2,

the script seems to work well. To compute trends on seasonal basis, could we just shift from diagnostics.annual_weighted_mean to diagnostics.seasonal_weighted_mean set in get_annual_precipitation function?

All the rest refers as you mentioned to plot tuning. If possible we would need a sort of switch between maps hatching for precipitation and temperature. Hatches for significant (not significant) trend values for precipitation (temperature). Then very minor aspects like faceted plots line spacing and colorbar colors discretization.

In the Tylor diagram we get an error but I believe is connected to the already discussed limited number of points, and I expect to be able to fix it as in https://github.com/bopen/c3s-eqc-toolbox-template/issues/54#issuecomment-1545774893

Thanks

malmans2 commented 1 year ago

Great, good news!

A couple of comments:

  1. I found a small bug (I was not selecting the last year for climatology), I'll push the updated version ASAP
  2. CORDEX uses/shows unweighted fields. ERA5 is also unweighted when CORDEX is selected. Is that OK?
  3. At the moment we compute and cache trends on the whole domain, and we perform the regionalisation on the cached data. It means that the very first time takes quite long, but then you can easily change the region. We could also do the opposite (regionalise first, speed up the computation, but the cache is invalidated each time the domain changes). I guess the latter is better if you plan to often change the time period rather than the domain. Let me know what's your favourite option.
lsangelantoni commented 1 year ago
  1. No worries, one year out at least thirty is not that dramatic.
  2. CORDEX - ERA5 have different grids type (rotated vs. regular). I believe ERA5 should be weighted like CMIP6. However, latitudinale range in CORDEX domain is not that large and it would not, at least in principle, be a relevant issue.
  3. I would stick with the first and current approach, since we are still defining subdomains on which perform analyses. Even though the first run is time demanding once done we can freely move within the domain. Concering the time period we will always consider all the time segment available for historical and future periods for computing statistics. We will frequently shift between annual and seasonal analyses but this should not affect the choice of regionalizing before or after the trends computation.
malmans2 commented 1 year ago

To compute trends on seasonal basis, could we just shift from diagnostics.annual_weighted_mean to diagnostics.seasonal_weighted_mean set in get_annual_precipitation function?

I need a clarification on this point. The time dimension used to compute the trend should have size 4 (i.e., DJF, MAM, JJA, SON) or size 4 * year (i.e., DJF2008, MAM2008, JJA2008, SON2008, DJF2009, MAM2009, JJA2009, SON2009, ...)?

lsangelantoni commented 1 year ago

Let's say starting from a time series of monthly mean data (e.g., from 1971 to 2000 from a i,j grid node) we have one dimensional array of length 12 30 values. We then make annual means (30 values) and then we compute the trend on these 30 values. Similarly in seasonal trends we would like preliminarily selecting a season having a time series of 3 months 30 years, deriving seasonal means and, as for the annual time series, computing the trend. If we want to think in matrices, including all the four seasons in one variable, we would have 4 columns (seasons) and 30 (n. of years) so I would say years*4 e.g., [ DJF2008, MAM2008, JJA2008, SON2008 (first line); DJF2009, MAM2009, JJA2009, SON2009 (second line); and so forth ] .

malmans2 commented 1 year ago

Got it, in xarray I believe this is what we need to do:

ds.groupby("time.year").map(diagnostics.seasonal_weighted_mean).stack(year_season=("year", "season"))
malmans2 commented 1 year ago

Last question I think. seasonal_weighted_mean is creating the usual DJF for each year. It means that D and J for DJF(year) are 11 months apart. Is it what you are looking for?

If not, is it OK to split in JFM, AMJ, JAS, OND? Or do you want D(year-1)J(year)F(year)?

lsangelantoni commented 1 year ago

Ah! we would need D(year-1)J(year)F(year) Since we consider a single winter as three consecutive winter months (DJF). For instance, winter of the 2000 is December 1999 and J,F 2000. I hope this does not complicate too much. We cannot consider F and D separated by 11 months as two months belonging to the same season in terms of driving atmospheric dynamics (though belonging to the same year). Thanks for pointing this out.

malmans2 commented 1 year ago

No big deal, you'll see that I used a little trick to shift years.

Here is the latest version: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/WIP_trend_pr_cordex_cmip6.ipynb

Weights are always applied to ERA5 (we'll need to do the same in the other notebook as well), maps should always be in good shape regardless the data requested, and we now use seasonal timeseries for the computation.

I'll take another look tomorrow with fresh eyes, and I will implement a switch for precipitation/temperature.

Re Taylor Diagram: I don't think it's just the same issue we had the other day. You might have to tune a bit the plotting parameters based on the results.

BTW, while developing/testing you might want to invalidate the cache. If you set the following global variable, it will invalidate the cache (only transformed data are invalidated, so you don't have to re-download data).

from c3s_eqc_automatic_quality_control import download

download.INVALIDATE_CACHE = True
malmans2 commented 1 year ago

Hi @lsangelantoni and @paololanteri,

The trend notebook is ready. You can now select precipitation or temperature at the beginning of the notebook. Temperature is converted in Celsius (I assumed you wanted to convert from K to C as in the other template notebook).

I noticed that there are cordex models with different height (1.5 instead of 2 I think), so I left the heigh coordinate in the datasets in case you need it.

I removed some of the hard-coded parameters in the Taylor Diagram, so it works fine even with small test data.

Let me know if everything works OK. If it's working OK, I'll adapt clima_and_bias_pr_cordex_cmip6_regionalised as well (e.g., parametrise the input variable and always weight ERA5 data).

Here is the notebook (renamed to trend_cordex_cmip6): https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/trend_cordex_cmip6.ipynb

lsangelantoni commented 1 year ago

Thank you very much. We are going to work on it and we will let you know.

lsangelantoni commented 1 year ago

Hi @malmans2,

It follows a few points about the latest version of the notebook.

Let us know if anything is unclear.

malmans2 commented 1 year ago

Possibility of preliminarily selecting annual or seasonal trend computation. For example, for JJA we would like to have the following timeseries [JJA1 JJA2 JJA3 ... JJA30]. In order to avoid the systematic trend computation over all four seasons + annual (5 trend computations per dataset) we propose to preliminarily select between ['annual','JJA','DJF','MAM','SON']. This could make calculations much faster (we noticed for example that for global ERA5 ram saturates during trend map computation).

For the annual computation, should the trend be computed from:

  1. timeseries = (Y0, Y1, Y2, Y3, ...) or
  2. timeseries = (DJF0, MAM0, JJA0, SON0, DJF1, MAM1, JJA1, SON1, ...)
lsangelantoni commented 1 year ago

timeseries = (Y0, Y1, Y2, Y3, ...)

malmans2 commented 1 year ago

Hi @malmans2,

It follows a few points about the latest version of the notebook.

  • [x] Possibility of preliminarily selecting annual or seasonal trend computation. For example, for JJA we would like to have the following timeseries [JJA1 JJA2 JJA3 ... JJA30]. In order to avoid the systematic trend computation over all four seasons + annual (5 trend computations per dataset) we propose to preliminarily select between ['annual','JJA','DJF','MAM','SON']. This could make calculations much faster (we noticed for example that for global ERA5 ram saturates during trend map computation).
  • [x] In case for some reason you might need to stick with merging all the seasons in variables they should be along different columns(or lines) since we need to study and plot seasonal trends separately. This is because we want to highlight how the trend can vary as a function of the season considered.
  • [x] For the temperature the trend should be derived in physical values (°C/decade) and not in percentage.
  • [x] Switch of map dashing in the function of the variable considered i.e., dashed lines over the significant (not significant) trend for the precipitation (temperature)

Let us know if anything is unclear.

I've implemented the first 3 (please make sure I got all units right). There's a small inconsistency between annual and seasonal. Do you want the annual means to start from December of the year before or not?

I didn't get what you mean in the last point.

malmans2 commented 1 year ago

Oh sorry, I think I got what you mean. I'll push the update in a minute.

malmans2 commented 1 year ago

I think we're all set: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/trend_cordex_cmip6.ipynb

lsangelantoni commented 1 year ago

Hi @malmans2,

thanks for your work. Just a few remarks.

The figures in the attachment are considering the entire historical period (1971-2005) for the Euro-CORDEX.

Let me know if anything is unclear.

Cheers, Lorenzo

lsangelantoni commented 1 year ago

P.S. the figure here below perhaps could help to clarify when I say "I expect to no dashed lines over very low values of trend slope" Schermata del 2023-05-21 10-48-27 As you can dashed lines overlays shade only over grid nodes with a "high" trend slope (the colorbar range is, of course, the same as the ensemble-mean one).

malmans2 commented 1 year ago

Hi,

We would need to have a control to reverse the colorbar colours

Done

Significant trends dashed lines in the maps. Whereas ERA5 and single-model maps look fine I believe that for the statistical significance of the model-ensemble trends we have an improper line dashing.

Last week we flipped all hatches for temperature. You don't want to flip the hatches for the is_signif_ratio variable? Or do you want to change the way is_signif_ratio is computed? I implemented the former at the moment.

Finally, for the Taylor diagram we have some graphical glitches regarding the decimals format for the RMSE

The library used is not very robust, you might have to tune a bit the parameters depending on the input values. I've hardcoded a linspace function, let me know if you have better ideas.

lsangelantoni commented 1 year ago

Hi @malmans2,

thanks, we have made a minor change to the script (line 32 in the Define plotting kwargs section) for coherently hatching significant trends for both p_value-based (single model and ERA5) and the inter-model agreement (ensemble) approaches. For the temperature (precipitation) we have hatching over not non-significant (significant) trends. At the moment the notebook is okay. I will let you know if we need to adjust any further details.

trend_cordex_cmip6.ipynb.gz