Closed lsangelantoni closed 1 year ago
Hi @lsangelantoni,
At first glance some of them look very similar to the original notebooks submitted by @paololanteri. We already reviewed those notebooks, adding ad-hoc optimised functions to our software and a few wp4 templates (see https://github.com/bopen/c3s-eqc-toolbox-template/tree/main/notebooks/wp4).
Are you aware of those templates? If yes, have you tried aligning your notebook with the wp4 templates available?
Notwithstanding some structural differences between CMIP6 and CORDEX outputs, It would be useful to homogenize as much as possible CMIP6 and CORDEX notebooks in terms of functions and workflow.
Do you mean that the wp4 templates/functions currently available do not work well with CORDEX?
The notebooks also import custom functions that are not included in the zip file (functions_tmp
).
(Those also look very similar to Paolo's original functions. None of the templates use functions from functions_tmp
as we implemented them in c3s_eqc_automatic_quality_control
whenever possible).
Another note after reading some of the notes in the notebooks.
Looks like you'd like to show the same statistics in the CMIP6 templates, but without applying weights.
At the moment you should be able to explicitly pass weights to all our functions starting with spatial_weighted
:
diagnostics.spatial_weighted_statistics(obj, weights=xr.DataArray(1))
This is probably a little confusing and not optimal, but allows to use the same functions for CMIP6 (weights=None
) and CORDEX (weights=xr.DataArray(1)
).
We can also easily add unweighted functions if that's useful, something like diagnostics.spatial_statistics(obj)
. Or we can add the option to pass bool: diagnostics.spatial_weighted_statistics(obj, weights=False)
Yes, I am aware of the CMIP6-based templates, it would be in any case useful to have the "unweighted" version of diagnostics.py functions.
OK, I'll add the unweighted functions tomorrow.
Thanks
Hi @lsangelantoni, It's now possible to compute spatial unweighted statistics/errors. I changed a little the diagnostics available rather than adding more diagnostics (easier to maintain as we can have fewer but more general functions).
All spatial_weighted
functions now accept a weights
argument, which can be a boolean or a DataArray. The docstring should be self explaining, I'm pasting here just one function as example:
from c3s_eqc_automatic_quality_control import diagnostics
help(diagnostics.spatial_weighted_errors)
spatial_weighted_errors(obj1: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset, obj2: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset, lon_name: collections.abc.Hashable | None = None, lat_name: collections.abc.Hashable | None = None, weights: xarray.core.dataarray.DataArray | bool = True, **kwargs: Any) -> xarray.core.dataset.Dataset | xarray.core.dataarray.DataArray
Calculate rmse, crmse, and correlation with latitude weighting.
Parameters
----------
obj1, obj2: Dataset or DataArray
Input data
lon_name, lat_name: str, optional
Name of longitude/latitude coordinate
weights: DataArray, bool, default: True
Weights to apply:
- True: weights are the cosine of the latitude
- False: unweighted
- DataArray: custom weights
Returns
-------
DataArray or Dataset
Object with rmse, crmse, and correlation
When weights=False
, the computation is more efficient than before because no weight is actually applied (before we had to apply dummy weights, such as weights=xr.DataArray(1)
.
I've already installed the latest version on WP4.
@lsangelantoni, I've uploaded another notebook to test the cmip6 precipitation notebook with cordex. I've made it more general, so the same notebook can be usde both for CMIP6 and CORDEX. Please test it and let me know if everything is working as expected (I only tested the small time range in the template). if everything works OK, we can deprecate the CMIP6-only notebook (cc: @paololanteri)
Here is the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb
Also, I've seen in the notebook a comment about taylor diagram colors. I've discussed it with Paolo in another thread, I believe it's a bug in the library used. I've opened a PR to fix it upstream, but no one got back to me. The discussion with Paolo is here: https://github.com/bopen/c3s-eqc-toolbox-template/issues/23#issuecomment-1497513623
Okay, then I will consider this last one. Thanks.
For the Taylor diagram I fixed it manually defining parameters for each model we want to include in the plot e.g., and then using this variable as an argument of the skill_metrics taylor_diagram function
Cool, we should be able to use our own color logic with that parameter.
FYI, I've pushed a couple of fixes to the notebook in main:
periodic
for the interpolation (True for cmip6, False for cordex)Have a good weekend!
Okay super, I am running the new clima_and_bias_pr_cordex_cmip6_regionalised.ipynb, I can confirm that it is working nicely over the shorter period. I am now running it over the entire historical period and I wouldn't expect differences... I have just added the @paololanteri function to adjust the colorbar range when we plot biases. I will send you the update version including this very minor change.
Have a nice weekend
I confirm that everything is working also the entire historical period. Below is the same script including the function to adjust colorbar range. clima_and_bias_pr_cordex_cmip6_regionalised_cbar_adj.zip We could leverage this notebook structure also for the trends computation.
Although it's not meant to be public, we ca use of of xarray
plotting utils for that: xr.plot.utils._determine_cmap_params
. It accepts all plotting kwargs, and returns a dict to make consistent plots.
I introduced it in the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb
For defining common CMIP6-CORDEX trends computation notebook, does it make sense to download datasets as performed in clima_and_bias_pr_cordex_cmip6_regionalised but outputting ds_sim variable with four dimensions (model,year,latitude,longitude) instead of three (model,latitude,longitude)? This just modifying ds = diagnostics.time_weighted_mean(ds) to ds = diagnostics.annual_weighted_mean(ds) in resample_and_regrid_and_rescale function. Then we could run spatial_weighted_trends function considering as input the above-mentioned ds_sim(?). Alternatively, we could be starting from and adjust the trend_pr_cmip6 template to be used with CORDEX simulations.
Sure, I think it's actually better because we can transform each chunk separately, and we can re-use cached transformed chunks when the time-period changes (i.e., we don't need to use transform_chunks=False
).
The only caveat is that we can not use chunks smaller than 1 year (I've added an assert to prevent this to happen).
I've updated the notebook: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/clima_and_bias_pr_cordex_cmip6_regionalised.ipynb
Great, thanks. "The only caveat is that we can not use chunks smaller than 1 year" No worries at all, we'll never go through analyses requiring such shorter time chunks. The idea behind this question is to have for at least climatology-biases and trends (historical and future) user cases a consistent CMIP6/CORDEX workflow in order to ease comparison between GCM and RCM ensembles over specific regions of the globe. This aims at making end users aware of the "strengths and weaknesses" of both GCMs and RCMs in relation to the particular domain/case/dynamic at hand.
However, if you want to re-use in trend_pr_cmip6.ipynb
the transform_func
cached in clima_and_bias_pr_cordex_cmip6_regionalised, that's not too easy for a couple of reasons:
trend_pr_cmip6
"The trends are computed using the native grid, so the regridding is performed after computing the trends" --> Yes this is actually the approach I would prefer in order to compute trend statistics in RCM native grid. "The trend computation can take quite long for high-res grids." Yes definitely, this is really time-consuming considering the 12 km resolution runs over the entire Euro-CORDEX domain. This was at least with the pymannkendall original_test function. I have not tried the xarray Mann Kendall function yet.
In any case, for me is totally fine to cache the trend computation as performed in trend_pr_cmip6 and in general I am open to any paths you deem more efficient to perform trend computation.
OK, I think at the moment we have the most efficient solution.
I realised we have an open issue regarding the statistics in the KDE cell in clima_and_bias_pr_cordex_cmip6_regionalised:
weights
arguments to the statistics computed and showed in the table on top of the KDE plot. Do you want to show unweighted statistics for CORDEX?weights=weights_flatten to seaborn.kdeplot should give the weighted version of kde.
But it does not work!! It worked with the previous version of seaborn
Because of that, at the moment we are not even using seaborn as we can just use pandas/matplotlib. Let me know if this is a technical issue and you need help from me to fix it, and if what is done at the moment in the notebook is correct and we can remove the TODO comment in the KDE cell.
"1. Do you want to show unweighted statistics for CORDEX?" Yes, thanks. Regarding seaborn.kdeplot issue I am not aware of its status, but I believe we can stick with pandas/matplotlib procedure. The only important aspect is to have weighted statistics for the CMIP6 GCMs and unweighted statistics for the CORDEX RCMs.
"1. Do you want to show unweighted statistics for CORDEX?" Yes, thanks.
Done!
@paololanteri, when you get a chance please test the CMIP6 version of clima_and_bias_pr_cordex_cmip6_regionalised and let me know if we can get rid of clima_and_bias_pr_cmip6_regionalised
Hi @malmans2, I tried clima_and_bias_pr_cordex_cmip6_regionalised. Everything works except that the function utils.regionalize() do not select the correct region. For example, with the default boundaries (around Mediterranean) the precipitation field I get is the one that should be located around longitude 180, both for ERA5 and models.
For what concern the kde plot for CMIP6 moldels bias, I confirm that it is still constructed using an unweighted distribution of data, so it is not coherent with the weighted statistics that are calculated and shown near the plot. I think that the comment I made is still valid.
weights=weights_flatten to seaborn.kdeplot should give the weighted version of kde.
But it does not work!! It worked with the previous version of seaborn
Do you have any ideas on how to solve this problem? Thank you! Paolo
Everything works except that the function utils.regionalize() do not select the correct region.
Good catch, thanks! There was a bug in utils.regionalize()
(the longitude was getting sorted, but not the variables).
Do you have any ideas on how to solve this problem?
I'm on it, I'll let you know if I can figure it out!
Hi @paololanteri, Could you please check whether this is working?
# Create dataframe
da = ds_bias["precipitation"]
df_stats = diagnostics.spatial_weighted_statistics(da, weights=weights).to_pandas()
# Plot
fig, ax = plt.subplots(1, 1)
x = np.linspace(
df_stats["ensemble"]["mean"] - 3 * df_stats["ensemble"]["std"],
df_stats["ensemble"]["mean"] + 3 * df_stats["ensemble"]["std"],
1_000,
)
for model, values in da.groupby("model"):
values = values.stack(dim=values.dims).dropna("dim")
y = scipy.stats.gaussian_kde(
values,
weights=np.cos(np.deg2rad(values["latitude"])) if weights else None,
).evaluate(x)
plot_kwargs = {"color": "k", "ls": "--"} if model == "ensemble" else {}
ax.plot(x, y, label=model, **plot_kwargs)
ax.grid()
ax.set_xlim(x[[0, -1]])
ax.set_xlabel(f"{da.attrs['long_name']} [{da.attrs['units']}]")
ax.legend()
# Add stats
table = plt.table(
cellText=df_stats.round(5).T.values.tolist(),
colLabels=df_stats.T.columns.values.tolist(),
rowLabels=df_stats.T.index.values.tolist(),
loc="top",
)
Using global coverage
lon_slice = slice(-180, 180)
lat_slice = slice(-90, 90)
i get an error. Does it occur also to you?
In the meanwhile I tried to make a single notebook for CMIP and CORDEX also for the notebook of trend. I called it trend_pr_cordex_cmip6.ipynb It seems working well for timeseries, but I'm not able to run it for cordex trend maps. Can you give a look to this please?
Thank you
Interesting issue due to precision. Here is a MRE:
import numpy as np
np.cos(np.deg2rad(np.asarray(90, float))) # 6.123233995736766e-17
np.cos(np.deg2rad(np.asarray(90, "float32"))) # -4.371139e-08
I'll give it some thoughts.
I think it's OK to just use np.abs
. I've updated both the notebook, and for consistency the weights used by c3s_eqc_automatic_quality_control
(we didn't catch it before because xarray allows negative weights).
I'll take a look to the trend notebook tomorrow.
Hi @paololanteri, It's still very much work in progress, but here is the notebook for trends cordex+cmip6: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/WIP_trend_pr_cordex_cmip6.ipynb
It does something, but I'm not sure it's correct + I need to tune a bit the plots. It's just a starting point, I'll work on this in the next couple of days.
BTW, I realized it was needed for trends, so now utils.regionalise
should also work with curvilinear grids.
Hi @malmans2,
the script seems to work well. To compute trends on seasonal basis, could we just shift from diagnostics.annual_weighted_mean to diagnostics.seasonal_weighted_mean set in get_annual_precipitation function?
All the rest refers as you mentioned to plot tuning. If possible we would need a sort of switch between maps hatching for precipitation and temperature. Hatches for significant (not significant) trend values for precipitation (temperature). Then very minor aspects like faceted plots line spacing and colorbar colors discretization.
In the Tylor diagram we get an error but I believe is connected to the already discussed limited number of points, and I expect to be able to fix it as in https://github.com/bopen/c3s-eqc-toolbox-template/issues/54#issuecomment-1545774893
Thanks
Great, good news!
A couple of comments:
To compute trends on seasonal basis, could we just shift from diagnostics.annual_weighted_mean to diagnostics.seasonal_weighted_mean set in get_annual_precipitation function?
I need a clarification on this point. The time dimension used to compute the trend should have size 4 (i.e., DJF, MAM, JJA, SON) or size 4 * year (i.e., DJF2008, MAM2008, JJA2008, SON2008, DJF2009, MAM2009, JJA2009, SON2009, ...)?
Let's say starting from a time series of monthly mean data (e.g., from 1971 to 2000 from a i,j grid node) we have one dimensional array of length 12 30 values. We then make annual means (30 values) and then we compute the trend on these 30 values. Similarly in seasonal trends we would like preliminarily selecting a season having a time series of 3 months 30 years, deriving seasonal means and, as for the annual time series, computing the trend. If we want to think in matrices, including all the four seasons in one variable, we would have 4 columns (seasons) and 30 (n. of years) so I would say years*4 e.g., [ DJF2008, MAM2008, JJA2008, SON2008 (first line); DJF2009, MAM2009, JJA2009, SON2009 (second line); and so forth ] .
Got it,
in xarray
I believe this is what we need to do:
ds.groupby("time.year").map(diagnostics.seasonal_weighted_mean).stack(year_season=("year", "season"))
Last question I think. seasonal_weighted_mean
is creating the usual DJF for each year.
It means that D and J for DJF(year) are 11 months apart. Is it what you are looking for?
If not, is it OK to split in JFM, AMJ, JAS, OND? Or do you want D(year-1)J(year)F(year)?
Ah! we would need D(year-1)J(year)F(year) Since we consider a single winter as three consecutive winter months (DJF). For instance, winter of the 2000 is December 1999 and J,F 2000. I hope this does not complicate too much. We cannot consider F and D separated by 11 months as two months belonging to the same season in terms of driving atmospheric dynamics (though belonging to the same year). Thanks for pointing this out.
No big deal, you'll see that I used a little trick to shift years.
Here is the latest version: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/WIP_trend_pr_cordex_cmip6.ipynb
Weights are always applied to ERA5 (we'll need to do the same in the other notebook as well), maps should always be in good shape regardless the data requested, and we now use seasonal timeseries for the computation.
I'll take another look tomorrow with fresh eyes, and I will implement a switch for precipitation/temperature.
Re Taylor Diagram: I don't think it's just the same issue we had the other day. You might have to tune a bit the plotting parameters based on the results.
BTW, while developing/testing you might want to invalidate the cache. If you set the following global variable, it will invalidate the cache (only transformed data are invalidated, so you don't have to re-download data).
from c3s_eqc_automatic_quality_control import download
download.INVALIDATE_CACHE = True
Hi @lsangelantoni and @paololanteri,
The trend notebook is ready. You can now select precipitation or temperature at the beginning of the notebook. Temperature is converted in Celsius (I assumed you wanted to convert from K to C as in the other template notebook).
I noticed that there are cordex models with different height (1.5 instead of 2 I think), so I left the heigh coordinate in the datasets in case you need it.
I removed some of the hard-coded parameters in the Taylor Diagram, so it works fine even with small test data.
Let me know if everything works OK. If it's working OK, I'll adapt clima_and_bias_pr_cordex_cmip6_regionalised
as well (e.g., parametrise the input variable and always weight ERA5 data).
Here is the notebook (renamed to trend_cordex_cmip6
): https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/trend_cordex_cmip6.ipynb
Thank you very much. We are going to work on it and we will let you know.
Hi @malmans2,
It follows a few points about the latest version of the notebook.
Let us know if anything is unclear.
Possibility of preliminarily selecting annual or seasonal trend computation. For example, for JJA we would like to have the following timeseries [JJA1 JJA2 JJA3 ... JJA30]. In order to avoid the systematic trend computation over all four seasons + annual (5 trend computations per dataset) we propose to preliminarily select between ['annual','JJA','DJF','MAM','SON']. This could make calculations much faster (we noticed for example that for global ERA5 ram saturates during trend map computation).
For the annual computation, should the trend be computed from:
timeseries = (Y0, Y1, Y2, Y3, ...)
Hi @malmans2,
It follows a few points about the latest version of the notebook.
- [x] Possibility of preliminarily selecting annual or seasonal trend computation. For example, for JJA we would like to have the following timeseries [JJA1 JJA2 JJA3 ... JJA30]. In order to avoid the systematic trend computation over all four seasons + annual (5 trend computations per dataset) we propose to preliminarily select between ['annual','JJA','DJF','MAM','SON']. This could make calculations much faster (we noticed for example that for global ERA5 ram saturates during trend map computation).
- [x] In case for some reason you might need to stick with merging all the seasons in variables they should be along different columns(or lines) since we need to study and plot seasonal trends separately. This is because we want to highlight how the trend can vary as a function of the season considered.
- [x] For the temperature the trend should be derived in physical values (°C/decade) and not in percentage.
- [x] Switch of map dashing in the function of the variable considered i.e., dashed lines over the significant (not significant) trend for the precipitation (temperature)
Let us know if anything is unclear.
I've implemented the first 3 (please make sure I got all units right). There's a small inconsistency between annual and seasonal. Do you want the annual means to start from December of the year before or not?
I didn't get what you mean in the last point.
Oh sorry, I think I got what you mean. I'll push the update in a minute.
Hi @malmans2,
thanks for your work. Just a few remarks.
We would need to have a control to reverse the colorbar colours rank i.e., BlueToRed (RedToBlue) for the temperature (precipitation). This is because e.g., higher temperatures (or positive temperature biases) should be in reddish hues. Vice versa for the precipitation.
Significant trends dashed lines in the maps. Whereas ERA5 and single-model maps look fine I believe that for the statistical significance of the model-ensemble trends we have an improper line dashing. . As you can see in the attachment we have significant trends over gird nodes displaying very low values of the slope and uncertain trend sign (both positive and negative trend slope). Over these latter grid nodes I expect no significant trends (slope close to 0) and to see no overlying dashed lines.
Finally, for the Taylor diagram we have some graphical glitches regarding the decimals format for the RMSE (see the attachment).
The figures in the attachment are considering the entire historical period (1971-2005) for the Euro-CORDEX.
Let me know if anything is unclear.
Cheers, Lorenzo
P.S. the figure here below perhaps could help to clarify when I say "I expect to no dashed lines over very low values of trend slope" As you can dashed lines overlays shade only over grid nodes with a "high" trend slope (the colorbar range is, of course, the same as the ensemble-mean one).
Hi,
We would need to have a control to reverse the colorbar colours
Done
Significant trends dashed lines in the maps. Whereas ERA5 and single-model maps look fine I believe that for the statistical significance of the model-ensemble trends we have an improper line dashing.
Last week we flipped all hatches for temperature. You don't want to flip the hatches for the is_signif_ratio
variable? Or do you want to change the way is_signif_ratio
is computed? I implemented the former at the moment.
Finally, for the Taylor diagram we have some graphical glitches regarding the decimals format for the RMSE
The library used is not very robust, you might have to tune a bit the parameters depending on the input values. I've hardcoded a linspace function, let me know if you have better ideas.
Hi @malmans2,
thanks, we have made a minor change to the script (line 32 in the Define plotting kwargs section) for coherently hatching significant trends for both p_value-based (single model and ERA5) and the inter-model agreement (ensemble) approaches. For the temperature (precipitation) we have hatching over not non-significant (significant) trends. At the moment the notebook is okay. I will let you know if we need to adjust any further details.
Notebook description
This directory includes:
temperature and precipitation historical period climatology and mean bias;
temperature historical period trend
for an ensemble of regional climate models from Euro-CORDEX.
cordex_notebooks.zip
Notebook link or upload
No response
Anything else we need to know?
Notwithstanding some structural differences between CMIP6 and CORDEX outputs, It would be useful to homogenize as much as possible CMIP6 and CORDEX notebooks in terms of functions and workflow.
Environment