carbonplan / leap-data-catalog

data catalog for the LEAP project
https://catalog.leap.carbonplan.org/
MIT License
1 stars 0 forks source link

data viewer connection statuses for datasets in the catalog #40

Closed andersy005 closed 1 month ago

andersy005 commented 2 months ago
url error
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/HadISST-feedstock/hadisst.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/AGDC-feedstock/AGCD.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/eVolv2k-feedstock/eVolv2k.zarr No CF axes information provided and unable to infer from metadata.
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/LMRv2p1_MCruns_ensemble_gridded-feedstock/LMRv2p1_MCruns_ensemble_gridded.zarr No viewable variables found. Please provide a dataset with at least 2D data arrays.
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/noaa-coastwatch-geopolar-sst.zarr ⚠️ slow to load
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-tg-tn-tx-rr-hu-pp.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-surface-downwelling.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-cmems-duacs ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/WOA_1degree_monthly-feedstock/woa18-1deg-monthly.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/cesm-atm-025deg-feedstock/cesm-atm-025deg.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/noaa_oisst/v2.1-avhrr.zarr ✅ None
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/soda342/5day_ice.zarr/ No CF axes information provided and unable to infer from metadata.
https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr JSON.parse: unexpected character at line 1 column 1 of the JSON data

Cc @katamartin

andersy005 commented 2 months ago

In [9]: url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/eVolv2k-feedstock/eVolv2k.zarr"

In [10]: ds = xr.open_zarr(url)

In [11]: cf_xarray.accessor._get_axis_coord(ds, "Y")
Out[11]: []

In [12]: ds
Out[12]: 
<xarray.Dataset> Size: 11kB
Dimensions:    (nerup: 271, time: 271)
Coordinates:
  * time       (time) object 2kB -0490-01-01 00:00:00 ... 1890-01-01 00:00:00
Dimensions without coordinates: nerup
Data variables:
    hemi       (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
    lat        (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
    sigma_ssi  (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
    ssi        (nerup) float64 2kB dask.array<chunksize=(271,), meta=np.ndarray>
Attributes:
    comment:  Minor update from v 2.0, includes reassignment of eruption regi...
    history:  Created Mon Oct 16 10:31:15 2017
    source:   Toohey and Sigl, 2017
    title:    Ice core-inferred volcanic stratospheric sulfur injection from ...

In [14]: ds.cf.axes
Out[14]: {}
In [15]: url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/LMRv2p1_MCruns_ensemble_gridded-feedstock/LMRv2p1_MCruns_ensemble_gridded.zarr"

In [16]: ds = xr.open_zarr(url)

In [17]: ds
Out[17]: 
<xarray.Dataset> Size: 37GB
Dimensions:        (time: 2001, MCrun: 20, lat: 91, lon: 180)
Coordinates:
  * lat            (lat) float32 364B -90.0 -88.0 -86.0 -84.0 ... 86.0 88.0 90.0
  * lon            (lon) float32 720B 0.0 2.0 4.0 6.0 ... 354.0 356.0 358.0
  * time           (time) object 16kB 0000-01-01 00:00:00 ... 2000-01-01 00:0...
Dimensions without coordinates: MCrun
Data variables: (12/14)
    air_mean       (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    air_spread     (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    hgt500_mean    (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    hgt500_spread  (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    pdsi_mean      (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    pdsi_spread    (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    ...             ...
    prate_mean     (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    prate_spread   (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    prmsl_mean     (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    prmsl_spread   (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    sst_mean       (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
    sst_spread     (time, MCrun, lat, lon) float32 3GB dask.array<chunksize=(1, 20, 91, 180), meta=np.ndarray>
Attributes:
    comment:                   File contains ensemble spread values for each ...
    description:               Last Millennium Reanalysis climate field recon...
    experiment:                productionFinal2_gisgpcc_ccms4_LMRdbv1.1.0_z500
    pangeo-forge:inputs_hash:  c148739ca23233f5121dd7f6ac70826b68f8831d19191b...
    pangeo-forge:recipe_hash:  aa66a32d990e984111f664c3a5bdd5326edcfcf20a588d...
    pangeo-forge:version:      0.9.2

In [19]: ds.cf.axes
Out[19]: {'X': ['lon'], 'Y': ['lat'], 'T': ['time']}
const finalVariables = dimensionalCheck.filter((d) => {
  const dims = metadata.metadata[`${prefix}${d}/.zattrs`]['_ARRAY_DIMENSIONS'];
  const allDimsExist = dims.every((dim) => {
    const exists = metadata.metadata[`${prefix}${dim}/.zarray`];
    if (!exists) {
      console.log(`Missing zarray for dimension '${dim}' in variable '${d}'`);
    }
    return exists;
  });
  return allDimsExist;
});
console.log(finalVariables);
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'air_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'air_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'hgt500_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'hgt500_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pdsi_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pdsi_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pr_wtr_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'pr_wtr_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prate_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prate_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prmsl_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'prmsl_spread'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'sst_mean'
VM1759:6 Missing zarray for dimension 'MCrun' in variable 'sst_spread'
VM1759:12 
[]
In [23]: ds = xr.open_dataset(url, engine='zarr', chunks={}, consolidated=True)

In [24]: ds.cf.axes
Out[24]: {'X': ['longitude'], 'Y': ['latitude'], 'T': ['time']}
andersy005 commented 2 months ago

@katamartin, do you happen to know why https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr shows a global view despite the dataset being regional?

In [3]: ds = xr.open_zarr("https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr")

In [4]: ds
Out[4]: 
<xarray.Dataset> Size: 11GB
Dimensions:    (time: 14976, latitude: 350, longitude: 511)
Coordinates:
  * latitude   (latitude) float64 3kB 35.05 35.15 35.25 ... 69.75 69.85 69.95
  * longitude  (longitude) float64 4kB -11.05 -10.95 -10.85 ... 39.85 39.95
  * time       (time) datetime64[ns] 120kB 1980-01-01 1980-01-02 ... 2020-12-31
Data variables:
    fg         (time, latitude, longitude) float32 11GB dask.array<chunksize=(40, 350, 511), meta=np.ndarray>
Attributes:
    CDI:                       Climate Data Interface version 1.6.3 (http://c...
    CDO:                       Climate Data Operators version 1.6.3 (http://c...
    Conventions:               CF-1.4
    E-OBS_version:             v23.1e
    NCO:                       netCDF Operators version 4.7.5 (Homepage = htt...
    References:                http://surfobs.climate.copernicus.eu/dataacces...
    history:                   Mon Aug  2 09:57:23 2021: ncks --no-abc -d tim...
    pangeo-forge:inputs_hash:  9bafc3f4b1a861cd149659de9602269a53feb1474b4af9...
    pangeo-forge:recipe_hash:  11a599a7d9aaecab7479159072b0d91bb9307e0e1c512c...
    pangeo-forge:version:      0.9.1

In [5]: ds.latitude.min(), ds.latitude.max()
Out[5]: 
(<xarray.DataArray 'latitude' ()> Size: 8B
 array(35.05),
 <xarray.DataArray 'latitude' ()> Size: 8B
 array(69.95))

In [6]: ds.longitude.min(), ds.longitude.max()
Out[6]: 
(<xarray.DataArray 'longitude' ()> Size: 8B
 array(-11.05),
 <xarray.DataArray 'longitude' ()> Size: 8B
 array(39.95))
katamartin commented 2 months ago

do you happen to know why https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-wind-speed.zarr shows a global view despite the dataset being regional?

It looks like the logic for locking the zoom currently relies on check against how many chunks are present for the total dataset. It seems like a safer option is to consistently block access to zoom unless a pyramid is provided. I can open a PR!

Update: addressed in https://github.com/carbonplan/ncview-js/pull/36 and https://github.com/carbonplan/ncview-js/pull/38

andersy005 commented 2 months ago

Cannot read properties of undefined (reading 'X') (@katamartin, do you happen to know what's going on here?)

katamartin commented 2 months ago

Update: I'm working on showing better errors for the remaining 2 blocking errors! PR up here: https://github.com/carbonplan/ncview-js/pull/39

Note that I think this will involve removing the checks for the ncviewjs:-prefixed attributes, which means that we'll have to coordinate the ncview-js update with deploying the leap-data-catalog staging branch to production.

katamartin commented 2 months ago

The last CF axis issue was resolved with https://github.com/carbonplan/ncview-js/pull/41, but I'm not seeing any data show up 🤔

When I inspect the data in the console on the frontend, it looks like the chunks are full of 0s. Does this match your view in Python @andersy005 @norlandrhagen?

katamartin commented 2 months ago

@andersy005 the issue with https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr seems to be coming from zarr-proxy returning a 500 -- any idea what's going on? See sample request: https://ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata

andersy005 commented 2 months ago

@andersy005 the issue with https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr seems to be coming from zarr-proxy returning a 500 -- any idea what's going on? See sample request: ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata

@katamartin

the metadata file contains fields with utf-16 encoding and this causes issues when the web server encodes the response:

In [10]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary']
Out[10]: 'This archive contains a dataset of high-spatial resolution (1/24\udcc2\udcb0, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. These data were created by using climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim version 1.4 and version 2 datasets, with coarser resolution time varying (i.e. monthly) data from CRU Ts4.0 and JRA-55 to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity.'

In [11]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
Cell In[11], line 1
----> 1 zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")

UnicodeEncodeError: 'utf-8' codec can't encode characters in position 64-65: surrogates not allowed
andersy005 commented 2 months ago

The last CF axis issue was resolved with carbonplan/ncview-js#41, but I'm not seeing any data show up 🤔

When I inspect the data in the console on the frontend, it looks like the chunks are full of 0s. Does this match your view in Python @andersy005 @norlandrhagen?

yes. i'm getting nan on my end. i'm not confident about the integrity of data from this feedstock. my hunch is that this is one of the earlier feedstocks, and there's a chance nobody looked at the data once the feedstock was run ~ 2 years ago.


In [7]: ds.cn.isel(time=0, ct=0).compute()
Out[7]: 
<xarray.DataArray 'cn' (yt: 1070, xt: 1440)> Size: 6MB
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    ct       float64 8B 0.0
    time     object 8B 1992-01-05 13:00:00
  * xt       (xt) float64 12kB -279.9 -279.7 -279.4 -279.2 ... 79.42 79.65 79.88
  * yt       (yt) float64 9kB -80.02 -79.92 -79.81 -79.7 ... 78.84 78.84 78.85
Attributes:
    cell_methods:   time: mean
    long_name:      ice concentration
    time_avg_info:  average_T1,average_T2,average_DT
    units:          0-1

In [9]: ds.hi.isel(time=0).compute()
Out[9]: 
<xarray.DataArray 'hi' (yt: 1070, xt: 1440)> Size: 6MB
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    time     object 8B 1992-01-05 13:00:00
  * xt       (xt) float64 12kB -279.9 -279.7 -279.4 -279.2 ... 79.42 79.65 79.88
  * yt       (yt) float64 9kB -80.02 -79.92 -79.81 -79.7 ... 78.84 78.84 78.85
Attributes:
    cell_methods:   time: mean
    long_name:      ice thickness
    time_avg_info:  average_T1,average_T2,average_DT
    units:          m-ice
norlandrhagen commented 2 months ago

Seems like it! The data array is also named 'hi' 🥇 . I can delete this feedstock and remove it from the catalog.

katamartin commented 2 months ago

@andersy005 the issue with https://ncview-js.staging.carbonplan.org/?dataset=https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr seems to be coming from zarr-proxy returning a 500 -- any idea what's going on? See sample request: ok6vedl4oj7ygb4sb2nzqvvevm0qhbbc.lambda-url.us-west-2.on.aws/ncsa.osn.xsede.org/Pangeo/pangeo-forge/terraclimate-feedstock/terraclimate.zarr/.zmetadata

@katamartin

the metadata file contains fields with utf-16 encoding and this causes issues when the web server encodes the response:

In [10]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary']
Out[10]: 'This archive contains a dataset of high-spatial resolution (1/24\udcc2\udcb0, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. These data were created by using climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim version 1.4 and version 2 datasets, with coarser resolution time varying (i.e. monthly) data from CRU Ts4.0 and JRA-55 to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity.'

In [11]: zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
Cell In[11], line 1
----> 1 zarr.util.json_loads(store['.zmetadata'])['metadata']['.zattrs']['summary'].encode("utf-8")

UnicodeEncodeError: 'utf-8' codec can't encode characters in position 64-65: surrogates not allowed

Huh super interesting -- on my latest branch, which skips the proxy request for metadata and instead fetches metadata via a serverless function, this dataset is now accessible: https://data-viewer-git-katamartin-improve-errors-carbonplan.vercel.app/?dataset=https%3A%2F%2Fncsa.osn.xsede.org%2FPangeo%2Fpangeo-forge%2Fterraclimate-feedstock%2Fterraclimate.zarr

@andersy005 how do you feel about doing our coordinated merging?

norlandrhagen commented 2 months ago

Looks like Terraclimate has some date issues after the first time step:

image

and looks like the zarr store only has a single time slice 😭

image

Guess we should wipe it.

andersy005 commented 1 month ago

Closing this as all issues seem to have been addressed