Closed almarbo closed 7 months ago
I'm leaving here an updated notebook run for the period 1971-1975. It's a pretty early version, built upon the progress in #131, and it'll need more work, especially concerning the RX5day index. Further iterations will be necessary once #131 is closed.
Hi @almarbo,
Things have changed quite a bit since you opened this issue. Can I start working on this or do you need to revise your comments/draft notebooks?
Actually, it would be nice to have the exact same structure that we have for maximum temperature. If you want me I can adapt the draft with the name of the variables and so on and then I upload it here.
I think I can do it, I'll give it a go in the afternoon or tomorrow. The most important thing is that the descriptions of the changes in your first comment are correct and complete (I haven't looked at it yet)
Hi @malmans2
I have been working a little bit on it,
I leave here a notebook that includes the three sub-notebooks (historical, future and Global warming levels) for CMIP6:
There are still some things that does not work for me: (1) I am not being able to invert the colorbars of the figures (blue tonalities should represent wetter conditions) and (2) I am having problems with the RX5day index for the models (not for ERA5). I am sure it has to do with units stuff and the way icclim interpret them. I have tried to include a factor 86400 (to convert from kg/(m^2s) to mm) but it does not change anything
I am having problems with the RX5day index for the models (not for ERA5). I am sure it has to do with units stuff and the way icclim interpret them. I have tried to include a factor 86400 (to convert from kg/(m^2s) to mm) but it does not change anything
I need more info about the issues you are having in order to help. I'd make a MRE so we can easily debug it. For example, download a small sample without even applying ay transform function, then send me a snippet that shows the issue. Something like this:
ds = download.download_and_transform(collection_id, request, chunks={"year": 1})
ds_index = icclim.index("RX5day", ...)
...
Hi, when doing a minimal reproducible example I am not experiencing any issue:
import numpy as np
# Time period
year_start = 1971
year_stop = 1972
model='access_cm2'
request=[request | {model_key: model} for request in request_sim[1]]
ds = download.download_and_transform(request_sim[0], request, chunks={"year": 1})
ds_index = icclim.index(index_name="RX5day",
in_files=ds,
slice_mode="JJA",
).drop_dims("bounds")
ds_index_mean=ds_index.mean("time")
print(np.max(ds_index_mean["RX5day"].values))
113.3097110723611
However, if I follow the workflow (which is the same as the one for the temperature case but adapted for precipitation - as you can see in the last version of the notebook that I shared) I get a very different value:
print(np.max(model_datasets['access_cm2']["RX5day"].values))
8182.29363571736
If I test for another index (e.g. "RX1 day"), I get the same results for both cases my MRE and the obtained using the whole code
I can provide also the request parameter:
[{'area': [72, -22, 27, 45],
'day': ['01',
'02',
'03',
'04',
'05',
'06',
'07',
'08',
'09',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31'],
'experiment': 'historical',
'format': 'zip',
'month': ['01',
'02',
'03',
'04',
'05',
'06',
'07',
'08',
'09',
'10',
'11',
'12'],
'temporal_resolution': 'daily',
'variable': 'precipitation',
'year': '1971',
'model': 'access_cm2'},
{'area': [72, -22, 27, 45],
'day': ['01',
'02',
'03',
'04',
'05',
'06',
'07',
'08',
'09',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31'],
'experiment': 'historical',
'format': 'zip',
'month': ['01',
'02',
'03',
'04',
'05',
'06',
'07',
'08',
'09',
'10',
'11',
'12'],
'temporal_resolution': 'daily',
'variable': 'precipitation',
'year': '1972',
'model': 'access_cm2'}]
and the request_sim[0], which is :
'projections-cmip6'
mmm, maybe it's because you are still using persist in your NB? Changing it to compute fixed some issues in the previous notebooks.
I started caching the raw data on the VM.
the last version provided in my comment do not have persist. I will also update the description notebook so there's no confusion
The difference between you snippet and what we are doing is that we reduce the amount of data at the very beginning of the computation (see the function select_timeseries
).
You can get the same value if you do this:
ds_index = icclim.index(
index_name="RX5day",
in_files=ds.where(ds["time"].dt.season == "JJA", drop=True),
slice_mode="JJA",
)
ds_index["RX5day"].mean("time").max().values
I don't know what RX5day does. If you expect the results to be identical, there's probably a bug in icclim.
Otherwise, if you always need the whole timeseries, we should change select_timeseries
.
Well seen! But I do not really understand why the results obtained should be different. In theory when you select the slice mode JJA it should take the JJA season, so it should not matter if what you pass to the function is annual or the JJA season is already selected. In fact, for the other indices, the results are exactly the same in both cases: if you first use select_timeseries
or if not.
The RX5day
selects within a period (JJA) the maximum 5-day total precipitation (i.e., it should select the 5 consecutive days where the amount of precipitation within the period is the highest and accumulate it). The RX1day
selects the day within the period with the maximum amount of rain accumulated. If we have, for a particular year, a maximum value of RX1day=71.31009
, the 'RX5day' should never exceed five time this value. Values of 8182.29363571736
are definitely too high, whilst values of 113.3097110723611
(obtained when we are not using select_timeseries
) are reasonable. What really surprises me is that, for ERA5 the obtained values are reasonable. For Cordex the problem persists, though.
Perhaps, should we consider not using select_timeseries
for this notebooks?
Not sure, you should probably open an issue in icclim to better understand what's going on. I'm changing that function to this:
def select_timeseries(ds, timeseries, year_start, year_stop):
if timeseries == "annual":
return ds.sel(time=slice(str(year_start), str(year_stop)))
return ds.sel(time=slice(f"{year_start-1}-12", f"{year_stop}-11"))
It will affect performance, but I was able to run it on the whole timeseries of ERA5 (ERA5 is the most affected because we do the resampling on more data).
I'll send you the new template as soon as I'm done caching (I merged temperature/precipitation so you can select the variable at the very beginning)
Hi @almarbo,
Here is the first draft of the template (temperature + precipitation): https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/extreme_indices.ipynb Here are the results for CMIP6 precipitation: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb
I didn't have time to check the results, let me know if they look OK.
I'll cache CORDEX in the meantime.
Little change I forgot to implement: colormaps are now flipped for precipitation
Hi @malmans2. Top! the results look okey for me. Could we proceed with Cordex from now on?
Yup it's running, but it's gonna take some time. Let's catch up after Easter.
Buon Pasqua!
CORDEX is also cached now: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb#file-cordex-ipynb
Hi @malmans2
thank you so much! the results look fine to me. We would like to also cache for CORDEX DJF if possible. We can do it on Tuesday, no worries.
Buona Pasqua!
I launched the scripts to cache DJF over the break, but unfortunately I need to make some change to the template.
The forms to download smhi_rca4
and uhoh_wrf361h
are different compared to the other models.
For example, we need to use start and end year 1970 to get that year rather than a 5-year window.
@almarbo precipitation historical is cached. Please take a look and let me know if it looks OK or there's anything to change: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb
I'll work on future next.
hi @malmans2
Thank you so much. Do you think the current workflow would support annual data? If yes, could we switch to annual?
Apologies for so many changes, but sometimes we need to change the initial idea depending on the results.
For the timeseries
parameter, I mean
I'll try to cache that overnight
Hi @almarbo,
Unfortunately, the VM is not able to handle the current setup using CORDEX and annual timeseries. The kernel dies when computing the indexes. It might take some time to find a solution that works well on the VM. Let me know if you want me to invest time on it.
(I'm going to run a test overnight. The only simple solution I can think about is to cache all indexes separately)
Hi @malmans2 , Thanks for testing it. Let me know about the test you are doing. If there is not an easy solution, no worries at all, I would sugest to go on with DJF. We would comment on the results within the final notebook if that's the case
Unfortunately, it didn't work. Maybe you can try on another machine if you have more resources?
oops. It´s okay, no worries I think, for now, it is okay to proceed with DJF. If needed, I would eventually use the resulting notebook in another macchine with more resources
OK, the next step is to make the future notebook compatible with precipitation, right?
Exactly!
Hi @malmans2 any news on the future for precipitation? Do you need me to work on it?
caching CMIP6 right now, should be done by the end of the day. I'll cache CORDEX over the weekend. I'll be in touch.
Actually, good timing. CMIP6 is ready. Could you please take a look?
Over the weekend I will cache CMIP6/CORDEX + DJF/JJA + temperature/precipitation + historical/future
I'm caching again temperature as the new templates are more general. If everything is in good shape, we can deprecate the temperature-only templates.
Here is the template: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/wp4/extreme_indices_future.ipynb Here is the notebook executed: https://gist.github.com/malmans2/da3767653635d24cf76e7b6e8968d896
hi @malmans2 the results look fine, thanks a lot. Have a nice weekend
Hi @almarbo,
I have good and bad news.
Good news first: I found the optimal chunking to cache annual as well. I've succesfully cached CMIP6 historical, I'll try to cache CORDEX tonight. Here is CMIP6: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb
Bad news: I found some problem with CORDEX future. The data of a couple of models are corrupted. It looks like a CDS problem, but I need some time to debug it. If it's a CDS problem, we need to open a ticket. Also, I can't cache DJF from 2015 because 2014 is not available.
Hi @almarbo,
I found the corrupted request. You can reproduce it on you machine with this:
import cdsapi
collection_id = "projections-cordex-domains-single-levels"
request = {
"area": [72, -22, 27, 45],
"domain": "europe",
"end_year": 2015,
"ensemble_member": "r1i1p1",
"experiment": "rcp_8_5",
"format": "zip",
"gcm_model": "mpi_m_mpi_esm_lr",
"horizontal_resolution": "0_11_degree_x_0_11_degree",
"rcm_model": "knmi_racmo22e",
"start_year": 2011,
"temporal_resolution": "daily_mean",
"variable": "mean_precipitation_flux",
}
client = cdsapi.Client(debug=True)
client.retrieve(collection_id, request, "download.zip")
If you unzip and ncdump the downloaded file, you'll see that the file is corrupted. Do you think it's an issue with the CDS? If yes, we'll open a ticket.
I'm pretty sure something went wrong in the CDS, I'm opening the ticket.
mmm, actually, looks like they've fixed it now. I'm going to try again.
Hi @malmans2 , thanks a lot for checking it all. I have ncdumped the downloaded file and I am not able to see where it is corrupted. I think I misssed something, what made you feel it is corrupt?
top news! thanks a lot, let's see with CORDEX
Good news first: I found the optimal chunking to cache annual as well. I've succesfully cached CMIP6 historical, I'll try to cache CORDEX tonight. Here is CMIP6: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb
Unfortunately we are a little bit late with the delivering of the notebooks and we should provide something ASAP. If it will take a while I think the easiest option would be just to change the corrupted models with a couple which are not corrupted.
Bad news: I found some problem with CORDEX future. The data of a couple of models are corrupted. It looks like a CDS problem, but I need some time to debug it. If it's a CDS problem, we need to open a ticket.
this is for CMIP6, right? or is it also for CORDEX? I think we can focus on CORDEX and forget about CMIP6 for precipitation. If it affects to both no worries, let's skip 2015.
Also, I can't cache DJF from 2015 because 2014 is not available.
I missed this message while I was writting mine. Thanks!
mmm, actually, looks like they've fixed it now. I'm going to try again.
Actually, sorry! The data IS corrupted. I pasted the wrong snippet. See:
import cdsapi
collection_id = "projections-cordex-domains-single-levels"
request = {
"area": [72, -22, 27, 45],
"domain": "europe",
"end_year": 2060,
"ensemble_member": "r1i1p1",
"experiment": "rcp_8_5",
"format": "zip",
"gcm_model": "mpi_m_mpi_esm_lr",
"horizontal_resolution": "0_11_degree_x_0_11_degree",
"rcm_model": "knmi_racmo22e",
"start_year": 2056,
"temporal_resolution": "daily_mean",
"variable": "mean_precipitation_flux",
}
client = cdsapi.Client(debug=True)
client.retrieve(collection_id, request, "download.zip")
I'll open a ticket and cc you (if I can).
Can you please recap here what do you want me to cache?
Yes. We would only need CORDEX to be cached (historical and future period). The preferred temporal aggregation is annual. If it is not possible we can perfectly go only with DJF. If the problem with the corrupted files will take a while, we would just prefer to change the two corrupted models with two of them which has no corrupted data.
CORDEX historical precipitation DJF is already cached. See: https://gist.github.com/malmans2/7a44ee0b9b7a56465e5a822301cf56fb
I'm trying annual right now, although during the day it's hard to cache CORDEX because the VM is crowded.
no worries, let's wait for it.
By the way, could you specify which were the models that gave problems? "knmi_racmo22e" and which was the other?
mohc_hadrem3_ga7_05
Right now I'm caching the other models. It looks like something went wrong with the cutout in the CDS, and corrupted files have been cached. I opened the ticket, so we might been able to use those models if the CDS team clears the cache.
okey, perfect!
Hi @almarbo,
I think I know how to avoid the files that are corrupted. I'll run everything again overnight.
However, talking with user support I realised that the area
argument in the CORDEX requests is useless.
Are you aware of it? I.e., adding the area parameter to the request does not make any difference, it just download the whole domain (europe in your case).
Notebook description
In this notebook, data from a subset of 9 models of CMIP6 Global Climate Models (GCM) , as well as ERA5 reanalysis, are considered. Five precipitation-based ECA&D indices are calculated using the icclim Python package, being them:
These calculations are performed over the historical period from 1971 to 2000 for the temporal aggregation of JJA.
After calculating these indices for the historical period (resulting in index values per year), temporal means and trends are calculated. Following this, the bias of the temporal mean values and the trend bias are calculated, using ERA5 as the reference dataset. These biases are displayed for each model and for the ensemble median. Additionally, maps of the ensemble spread (derived as the standard deviation of the ensemble members' distribution) are calculated and displayed for the mean values and for the trends. Finally, boxplots which represent statistical distributions (PDF) built on the historical trend from each considered model are shown (also in this case ERA 5 is considered as reference product).
The size and location of the subdomain considered are customizable, as well as the temporal aggregation (annual or seasonal).
Analyses
This notebook performs the following analyses:
Notebook link or upload
historical_cmip6_extreme_pr_indices.ipynb.zip
Anything else we need to know?
This notebook keeps the same structure that the one #131 . 5 notebooks will be needed :
*Note: the future notebooks will not need the historical period (as we are not dealing with statistical thresholds) Main changes compared to #131 :
to:
models_cmip6 = [ "access_cm2", "bcc_csm2_mr", "cmcc_esm2", "cnrm_cm6_1_hr", "ec_earth3_cc", "gfdl_esm4", "inm_cm5_0", "miroc6", "mpi_esm1_2_lr", ]