Closed sandrocalmanti closed 1 year ago
Hi @sandrocalmanti,
Yes, it's the download_and_transform
function.
The argument chunks
allows to split the request in various smaller requests (e.g., 1 request per year, or 1 per month, ...). Furthermore, the transform_func
is applied to each chunk and cached separately (particularly useful for data reduction, as much smaller data is saved on disk). Finally, we use dask
under the hood, which is a library that allows out-of-memory computations.
Great, thank you Mattia,
is there any example I can use to understand how download_and_transform works, or could you describe how it is applied to this simple kernel so it work for large datasets as it is working for small ones?
ds = xr.open_dataset('./DATA/ERA5_ta_plev_monthly_1973-1999.nc')
t = ds['t']
#Weight temperature values with latitude before averaging
weights = np.cos(np.deg2rad(ds.latitude))
t_weighted = ds.t.weighted(weights)
#Compute monthly global average
t_ave = t_weighted.mean(["longitude", "latitude"]).transpose("level","time")
#Compute climatology
t_ave_time = t_ave.mean(["time"])
#Compute anomalies
t_ave_anom = t_ave - t_ave_time
You can combine download. download_and_transform
and diagnostics.spatial_weighted_mean
.
They're both used in many templates, and climatology are produced in a few WP4 templates.
I can do a template specific for your use case. Is this the full dataset you need?
'reanalysis-era5-pressure-levels-monthly-means',
{
'format': 'netcdf',
'product_type': 'monthly_averaged_reanalysis',
'variable': 'temperature',
'pressure_level': [
'1', '5', '20',
'70', '150', '225',
'350', '500', '650',
'775', '850', '925',
'1000',
],
'year': [
'1940', '1941', '1942',
'1943', '1944', '1945',
'1946', '1947', '1948',
'1949', '1950', '1951',
'1952', '1953', '1954',
'1955', '1956', '1957',
'1958', '1959', '1960',
'1961', '1962', '1963',
'1964', '1965', '1966',
'1967', '1968', '1969',
'1970', '1971', '1972',
'1973', '1974', '1975',
'1976', '1977', '1978',
'1979', '1980', '1981',
'1982', '1983', '1984',
'1985', '1986', '1987',
'1988', '1989', '1990',
'1991', '1992', '1993',
'1994', '1995', '1996',
'1997', '1998', '1999',
'2000', '2001', '2002',
'2003', '2004', '2005',
'2006', '2007', '2008',
'2009', '2010', '2011',
'2012', '2013', '2014',
'2015', '2016', '2017',
'2018', '2019', '2020',
'2021', '2022', ],
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'time': '00:00',
'area': [
90, -180, -90,
180,
],
},
Thank you Mattia,
you're right, the templates. I'll have a look at that. Meanwhile, yes that's the full dataset.
S.
OK, I'll send you the link when it's ready. In the meantime, this is pretty much what you are looking for: https://github.com/bopen/c3s-eqc-toolbox-template/blob/main/notebooks/renalysis/02-Application_Template_Global_Timeseries_Pressure_Levels.ipynb
Hi @sandrocalmanti,
Your template is ready and available here.
The template is a small example (2022-present), you just have to change the variable start
to produce larger timeseries.
The first time you run it it will take some time to download all data, but then the spatial weighted fields are cached and you can focus on the analysis.
The template produces this figure (this period is already cached on WP5):
Thank you Mattia, looks great.
I'll try later.
Dear @malmans2
I have updated your template with two edits:
In case you want to update this in the wp5 templates
Cheers
S.
Great! Looks like caching is also working pretty good, I've been able to play with your notebook on the VM.
I've updated the template. I only changed a few things in the last cell to make the Hovmöller Diagram with xarray (there's a couple of arguments you could find useful, such as robust=True
).
I'm closing this, but feel free to re-open in the future!
Describe the solution you'd like
In WP5 I'm using the attached notebook to show temperature anomalies on pressure levels.
The notebook works correctly when selecting a limited domain (for example 10S-10N) and a limited number of years, but I get into trouble when handling larger datasets.
Ideally I would like to compute the average vertical profile of temperature anomaly for the entire globe (-90:90, -180:180) and for the full time series, from 1940 to 2022. In this case, the file is 25GB (I'm using the monthly averaged reanalysis).
I guess others may have already had this problem but I couldn't find any past issue on this subject. I expect to have similar issues in our work for WP3 on seasonal forecasts
How do I handle large arrays in general? Is it with the download_and_transform function?
my_ipynb.zip