Closed andersy005 closed 1 month ago
cf_xarray
seems confused about evolv2k
In [9]: url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/eVolv2k-feedstock/eVolv2k.zarr"
In [10]: ds = xr.open_zarr(url)
In [11]: cf_xarray.accessor._get_axis_coord(ds, "Y")
Out[11]: []
In [12]: ds
Out[12]:
<xarray.Dataset> Size: 11kB
Dimensions: (nerup: 271, time: 271)
Coordinates:
* time (time) object 2kB -0490-01-01 00:00:00 ... 1890-01-01 00:00:00
Dimensions without coordinates: nerup
Data variables:
hemi (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
lat (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
sigma_ssi (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
ssi (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
Attributes:
comment: Minor update from v 2.0, includes reassignment of eruption regi...
history: Created Mon Oct 16 10:31:15 2017
source: Toohey and Sigl, 2017
title: Ice core-inferred volcanic stratospheric sulfur injection from ...
In [14]: ds.cf.axes
Out[14]: {}
In [15]: url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/LMRv2p1_MCruns_ensemble_gridded-feedstock/LMRv2p1_MCruns_ensemble_gridded.zarr"
In [16]: ds = xr.open_zarr(url)
In [17]: ds
Out[17]:
<xarray.Dataset> Size: 37GB
Dimensions: (time: 2001, MCrun: 20, lat: 91, lon: 180)
Coordinates:
* lat (lat) float32 364B -90.0 -88.0 -86.0 -84.0 ... 86.0 88.0 90.0
* lon (lon) float32 720B 0.0 2.0 4.0 6.0 ... 354.0 356.0 358.0
* time (time) object 16kB 0000-01-01 00:00:00 ... 2000-01-01 00:0...
Dimensions without coordinates: MCrun
Data variables: (12/14)
air_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
air_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
hgt500_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
hgt500_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
pdsi_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
pdsi_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
... ...
prate_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
prate_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
prmsl_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
prmsl_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
sst_mean (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
sst_spread (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
Attributes:
comment: File contains ensemble spread values for each ...
description: Last Millennium Reanalysis climate field recon...
experiment: productionFinal2_gisgpcc_ccms4_LMRdbv1.1.0_z500
pangeo-forge:inputs_hash: c148739ca23233f5121dd7f6ac70826b68f8831d19191b...
pangeo-forge:recipe_hash: aa66a32d990e984111f664c3a5bdd5326edcfcf20a588d...
pangeo-forge:version: 0.9.2
In [19]: ds.cf.axes
Out[19]: {'X': ['lon'], 'Y': ['lat'], 'T': ['time']}
MCrun
being a dimension without zarray associated with it.const finalVariables = dimensionalCheck.filter((d) => {
const dims = metadata.metadata[`${prefix}${d}/.zattrs`]['_ARRAY_DIMENSIONS'];
const allDimsExist = dims.every((dim) => {
const exists = metadata.metadata[`${prefix}${dim}/.zarray`];
if (!exists) {
console.log(`Missing zarray for dimension '${dim}' in variable '${d}'`);
}
return exists;
});
return allDimsExist;
});
console.log(finalVariables);
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'air_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'air_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'hgt500_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'hgt500_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pdsi_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pdsi_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pr_wtr_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pr_wtr_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prate_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prate_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prmsl_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prmsl_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'sst_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'sst_spread'
VM1759:12
[]
Cannot read properties of undefined (reading 'X')
not sure where this property is being accessed and causing this issueIn [23]: ds = xr.open_dataset(url, engine='zarr', chunks={}, consolidated=True)
In [24]: ds.cf.axes
Out[24]: {'X': ['longitude'], 'Y': ['latitude'], 'T': ['time']}
@katamartin, do you happen to know why https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr shows a global view despite the dataset being regional?
In [3]: ds = xr.open_zarr("https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr")
In [4]: ds
Out[4]:
<xarray.Dataset> Size: 11GB
Dimensions: (time: 14976, latitude: 350, longitude: 511)
Coordinates:
* latitude (latitude) float64 3kB 35.05 35.15 35.25 ... 69.75 69.85 69.95
* longitude (longitude) float64 4kB -11.05 -10.95 -10.85 ... 39.85 39.95
* time (time) datetime64[ns] 120kB 1980-01-01 1980-01-02 ... 2020-12-31
Data variables:
fg (time, latitude, longitude) float32 11GB dask.array<chunksize=(40, 350, 511), meta=np.ndarray>
Attributes:
CDI: Climate Data Interface version 1.6.3 (http://c...
CDO: Climate Data Operators version 1.6.3 (http://c...
Conventions: CF-1.4
E-OBS_version: v23.1e
NCO: netCDF Operators version 4.7.5 (Homepage = htt...
References: http://surfobs.climate.copernicus.eu/dataacces...
history: Mon Aug 2 09:57:23 2021: ncks --no-abc -d tim...
pangeo-forge:inputs_hash: 9bafc3f4b1a861cd149659de9602269a53feb1474b4af9...
pangeo-forge:recipe_hash: 11a599a7d9aaecab7479159072b0d91bb9307e0e1c512c...
pangeo-forge:version: 0.9.1
In [5]: ds.latitude.min(), ds.latitude.max()
Out[5]:
(<xarray.DataArray 'latitude' ()> Size: 8B
array(35.05),
<xarray.DataArray 'latitude' ()> Size: 8B
array(69.95))
In [6]: ds.longitude.min(), ds.longitude.max()
Out[6]:
(<xarray.DataArray 'longitude' ()> Size: 8B
array(-11.05),
<xarray.DataArray 'longitude' ()> Size: 8B
array(39.95))
do you happen to know why https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr shows a global view despite the dataset being regional?
It looks like the logic for locking the zoom currently relies on check against how many chunks are present for the total dataset. It seems like a safer option is to consistently block access to zoom unless a pyramid is provided. I can open a PR!
Update: addressed in https://github.com/carbonplan/ncview-js/pull/36 and https://github.com/carbonplan/ncview-js/pull/38
Cannot read properties of undefined (reading 'X') (@katamartin, do you happen to know what's going on here?)
Update: I'm working on showing better errors for the remaining 2 blocking errors! PR up here: https://github.com/carbonplan/ncview-js/pull/39
Note that I think this will involve removing the checks for the ncviewjs:
-prefixed attributes, which means that we'll have to coordinate the ncview-js
update with deploying the leap-data-catalog
staging
branch to production.
The last CF axis issue was resolved with https://github.com/carbonplan/ncview-js/pull/41, but I'm not seeing any data show up 🤔
When I inspect the data in the console on the frontend, it looks like the chunks are full of 0
s. Does this match your view in Python @andersy005 @norlandrhagen?
@andersy005 the issue with https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr
seems to be coming from zarr-proxy
returning a 500 -- any idea what's going on? See sample request: https://ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata
@andersy005 the issue with
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr
seems to be coming fromzarr-proxy
returning a 500 -- any idea what's going on? See sample request: ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata
@katamartin
the metadata file contains fields with utf-16 encoding and this causes issues when the web server encodes the response:
In [10]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary']
Out[10]: 'This archive contains a dataset of high-spatial resolution (1/24\udcc2\udcb0, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. These data were created by using climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim version 1.4 and version 2 datasets, with coarser resolution time varying (i.e. monthly) data from CRU Ts4.0 and JRA-55 to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity.'
In [11]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
Cell In[11], line 1
----> 1 zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 64-65: surrogates not allowed
The last CF axis issue was resolved with carbonplan/ncview-js#41, but I'm not seeing any data show up 🤔
When I inspect the data in the console on the frontend, it looks like the chunks are full of
0
s. Does this match your view in Python @andersy005 @norlandrhagen?
yes. i'm getting nan
on my end. i'm not confident about the integrity of data from this feedstock. my hunch is that this is one of the earlier feedstocks, and there's a chance nobody looked at the data once the feedstock was run ~ 2 years ago.
In [7]: ds.cn.isel(time=0, ct=0).compute()
Out[7]:
<xarray.DataArray 'cn' (yt: 1070, xt: 1440)> Size: 6MB
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
ct float64 8B 0.0
time object 8B 1992-01-05 13:00:00
* xt (xt) float64 12kB -279.9 -279.7 -279.4 -279.2 ... 79.42 79.65 79.88
* yt (yt) float64 9kB -80.02 -79.92 -79.81 -79.7 ... 78.84 78.84 78.85
Attributes:
cell_methods: time: mean
long_name: ice concentration
time_avg_info: average_T1,average_T2,average_DT
units: 0-1
In [9]: ds.hi.isel(time=0).compute()
Out[9]:
<xarray.DataArray 'hi' (yt: 1070, xt: 1440)> Size: 6MB
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
time object 8B 1992-01-05 13:00:00
* xt (xt) float64 12kB -279.9 -279.7 -279.4 -279.2 ... 79.42 79.65 79.88
* yt (yt) float64 9kB -80.02 -79.92 -79.81 -79.7 ... 78.84 78.84 78.85
Attributes:
cell_methods: time: mean
long_name: ice thickness
time_avg_info: average_T1,average_T2,average_DT
units: m-ice
Seems like it! The data array is also named 'hi' 🥇 . I can delete this feedstock and remove it from the catalog.
@andersy005 the issue with
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr
seems to be coming fromzarr-proxy
returning a 500 -- any idea what's going on? See sample request: ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata@katamartin
the metadata file contains fields with utf-16 encoding and this causes issues when the web server encodes the response:
In [10]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'] Out[10]: 'This archive contains a dataset of high-spatial resolution (1/24\udcc2\udcb0, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. These data were created by using climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim version 1.4 and version 2 datasets, with coarser resolution time varying (i.e. monthly) data from CRU Ts4.0 and JRA-55 to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity.' In [11]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8") --------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8") UnicodeEncodeError: 'utf-8' codec can't encode characters in position 64-65: surrogates not allowed
Huh super interesting -- on my latest branch, which skips the proxy request for metadata and instead fetches metadata via a serverless function, this dataset is now accessible: https://data-viewer-git-katamartin-improve-errors-carbonplan.vercel.app/?dataset=https%3A%2F%2Fncsa.osn.xsede.org%2FPangeo%2Fpangeo-forge%2Fterraclimate-feedstock%2Fterraclimate.zarr
@andersy005 how do you feel about doing our coordinated merging?
Looks like Terraclimate has some date issues after the first time step:
and looks like the zarr store only has a single time slice 😭
Guess we should wipe it.
Closing this as all issues seem to have been addressed
Cc @katamartin