google-research / arco-era5

Recipes for reproducing Analysis-Ready & Cloud Optimized (ARCO) ERA5 datasets.
https://cloud.google.com/storage/docs/public-datasets/era5
Apache License 2.0
287 stars 22 forks source link

Lat/lon gridded data does not have monotonically increasing latitudes #60

Open jbusecke opened 11 months ago

jbusecke commented 11 months ago

First of all THANK YOU so much for this effort! Having ERA5 data available in an ARCO format is truly a game changer!

I noticed a small issue: The latitudes of the lat/lon gridded data 1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/ seems to have decreasing latitude values

import xarray as xr

ar_full_37_1h = xr.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2/',
).isel(time=0)
ar_full_37_1h
image

which makes selecting a region in xarray slightly counterintuitive:

ar_full_37_1h.sel(latitude=slice(-50, 50))

returns no latitude indicies

image

while

ar_full_37_1h.sel(latitude=slice(50, -50))

gives (the desired)

image

If you end up reprocessing the data at some point, I wonder if something like xarrays ds.sortby('latitude') or equivalent could be added to the pipeline.

tom-andersson commented 10 months ago

@jbusecke this is standard for reanalysis data (such as ERA5), although I agree it is counterintuitive.

For example, try loading sample NCEP reanalysis data with xarray:

>>> import xarray as xr
>>> print(xr.tutorial.open_dataset("air_temperature"))
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

The latitude values are also in decreasing order.

jbusecke commented 10 months ago

Oh interesting, I did not know that. Thanks @tom-andersson.

I personally would make the argument that this is something that 'should' be changed to make the data more analysis ready, but I guess this is somewhat personal preference and it would be good if there is more general guidance on this that ARCO-producers could refer to. In fact I wonder if this is something that would fall under a 'tidy array' concept (see this talk from scipy this year). @dcherian, where would be a good place to discuss this sort of stuff?

dcherian commented 10 months ago

I agree that this is not ideal but there are very many datasets like this ;) particularly in the raster imaging space.

See https://github.com/pydata/xarray/issues/1613 for a discussion on a nicer API that ignores order of the coordinate variable.