CCI-Tools / cate

ESA CCI Toolbox (Cate)
MIT License
50 stars 15 forks source link

Support BIOMASS dataset in Zarr Data Store #981

Open TonioF opened 3 years ago

TonioF commented 3 years ago

The Zarr Data Store contains data from the ODP that has been converted into the zarr format. There is one BIOMASS dataset in the Zarr data store. For the purposes of this issue, we say that a dataset is supported when

  1. it can be opened in cate
  2. it can be opened in cate with a spatial subset
  3. its content can be written to disk
  4. its data can be displayed in cate

The BIOMASS dataset cannot be opened with a spatial subset. The traceback is:

[2021-04-29 08:49:29] Request: open_dataset(datasetid=ESACCI-BIOMASS-L4-AGB-MERGED-100m-2010-2018-fv2.0.zarr, time_range=('2017-01-01', '2017-01-01'), var_names=['agb', 'agb_se'], region=[123.5265, 60.20374, 123.52827, 60.20552])

Traceback (most recent call last): File "test_cci_data_support.py", line 327, in test_opends dataset, = open_dataset(dataset_id=data_id, File "/home/users/tfincke/Projects/cate/cate/core/ds.py", line 432, in open_dataset dataset = select_subset(dataset, subset_args) File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 37, in select_subset dataset = select_spatial_subset(dataset, xy_bbox=bbox) File "/home/users/tfincke/Projects/xcube/xcube/core/select.py", line 85, in select_spatial_subset geo_coding = geo_coding if geo_coding is not None else GeoCoding.from_dataset(dataset, xy_names=xy_names) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 132, in from_dataset return cls.from_xy((x, y), xy_names=(x_name, y_name)) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 169, in from_xy x, is_lon_normalized = _maybe_normalise_2d_lon(x) File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 462, in _maybe_normalise_2d_lon if _is_crossing_antimeridian(lon_var): File "/home/users/tfincke/Projects/xcube/xcube/core/geocoding.py", line 457, in _is_crossing_antimeridian return abs(lon_var.diff(dim=dim_x)).max() > 180.0 or \ File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataarray.py", line 3107, in diff ds = self._to_temp_dataset().diff(n=n, dim=dim, label=label) File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/dataset.py", line 5489, in diff variables[name] = var.isel(kwargs_end) - var.isel(**kwargs_start) File "/home/users/tfincke/miniconda3/envs/xcube/lib/python3.8/site-packages/xarray/core/variable.py", line 2301, in func f(self_data, other_data) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 475. GiB for an array with shape (157500, 404999) and data type float64

forman commented 3 years ago

Should be fixed in cate 3.0 by https://github.com/dcs4cop/xcube/issues/442

TonioF commented 3 years ago

Viewing the dataset will resilt in a DeveloperError: Width must be less than or equal to the maximum texture size (16384). Check maximumTextureSize. This error probably happens due to the massive size of the dataset (157500 * 405000)

AliceBalfanz commented 3 years ago

This comment is invalid due to wrong url:

I see different errors: All three approaches result in the same error message (using zarr, xarray and xcube) with anonymous access:

ClientConnectorError: Cannot connect to host cci-ke-o.s3.jc.rl.ac.uk:80 ssl:default [Connect call failed ('172.17.2.151', 80)]

Or is that cube not publicly accessible yet?

AliceBalfanz commented 3 years ago

When opening BIOMASS with newest xcube, it works fine with open_dataset:

from xcube.core.dsio import open_dataset, open_cube ds = open_dataset("https://cci-ke-o.s3-ext.jc.rl.ac.uk:8443/esacci/ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-1978-2020-fv05.3.zarr", s3_kwargs=dict(anon=True))

image

When opening it with open_cube an error occurs:

image

forman commented 2 years ago

image