Open jbusecke opened 11 months ago
@jbusecke this is standard for reanalysis data (such as ERA5), although I agree it is counterintuitive.
For example, try loading sample NCEP reanalysis data with xarray:
>>> import xarray as xr
>>> print(xr.tutorial.open_dataset("air_temperature"))
<xarray.Dataset>
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float32 ...
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
The latitude values are also in decreasing order.
Oh interesting, I did not know that. Thanks @tom-andersson.
I personally would make the argument that this is something that 'should' be changed to make the data more analysis ready, but I guess this is somewhat personal preference and it would be good if there is more general guidance on this that ARCO-producers could refer to. In fact I wonder if this is something that would fall under a 'tidy array' concept (see this talk from scipy this year). @dcherian, where would be a good place to discuss this sort of stuff?
I agree that this is not ideal but there are very many datasets like this ;) particularly in the raster imaging space.
See https://github.com/pydata/xarray/issues/1613 for a discussion on a nicer API that ignores order of the coordinate variable.
First of all THANK YOU so much for this effort! Having ERA5 data available in an ARCO format is truly a game changer!
I noticed a small issue: The latitudes of the lat/lon gridded data 1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/ seems to have decreasing latitude values
which makes selecting a region in xarray slightly counterintuitive:
returns no latitude indicies
while
gives (the desired)
If you end up reprocessing the data at some point, I wonder if something like xarrays
ds.sortby('latitude')
or equivalent could be added to the pipeline.