SlideRuleEarth / sliderule

Server and client framework for on-demand science data processing in the cloud
https://slideruleearth.io
Other
29 stars 12 forks source link

HSDS failure to locate dataset by path #39

Closed jpswinski closed 3 years ago

jpswinski commented 4 years ago

When performing a run against Grand Mesa, HSDS returns multiple errors for select files when trying to read data from those files. The problem is reproducible. Here is a sample output from SlideRule:

HDF5 REST VOL-DIAG: Error detected in HDF5 REST VOL (1.0.0) thread 139748614063872:
  #000: /home/jswinski/Downloads/vol-rest/src/rest_vol_dataset.c line 321 in RV_dataset_open(): can't locate dataset by path
    major: Dataset
    minor: Problem with path to object
  #001: /home/jswinski/Downloads/vol-rest/src/rest_vol.c line 1834 in RV_find_object_by_path(): can't locate parent group for object of unknown type
    major: Symbol table
    minor: Problem with path to object
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 139748614063872:
  #000: ../../src/H5D.c line 296 in H5Dopen2(): unable to open dataset
    major: Dataset
    minor: Can't open object
  #001: ../../src/H5VLcallback.c line 1974 in H5VL_dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: ../../src/H5VLcallback.c line 1941 in H5VL__dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
2020:304:17:17:20:H5Lib.cpp:303:CRITICAL: Failed to open dataset: /gt3l/geolocation/reference_photon_lat
2020:304:17:17:20:Atl03Reader.cpp:545:CRITICAL: Unable to process resource hsds:///hsds/ATL03/ATL03_20190314033606_11560206_003_01.h5: H5Lib
jpswinski commented 4 years ago

HSDS Errors.txt

jpswinski commented 4 years ago

I reloaded ATL03_20190314033606_11560206_003_01.h5 (which is the first file to have an error) and subsequent attempts to access the file were successful. So it appears that the error occurs during the loading process.

jpswinski commented 4 years ago

An attempt to reload a different file in the set above resulted in the following error (produced as output to the hsload command)

ERROR 2020-11-02 14:22:00,575 utillib.py:455 ERROR: failed to create dataset: Gateway Timeout
Traceback (most recent call last):
  File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5py-2.10.0-py3.8-linux-x86_64.egg/h5py/_hl/group.py", line 600, in proxy
    return func(name, self[name])
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 658, in object_create_helper
    create_dataset(obj, ctx)
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 457, in create_dataset
    return dset
UnboundLocalError: local variable 'dset' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./.pyenv/versions/3.8.3/bin/hsload", line 11, in <module>
    load_entry_point('h5pyd==0.8.0', 'console_scripts', 'hsload')()
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/hsload.py", line 309, in main
    load_file(fin, fout, verbose=verbose, dataload=dataload, s3path=s3path, compression=compression, compression_opts=compression_opts)
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5pyd-0.8.0-py3.8.egg/h5pyd/_apps/utillib.py", line 698, in load_file
    fin.visititems(object_create_helper)
  File "/home/ubuntu/.pyenv/versions/3.8.3/lib/python3.8/site-packages/h5py-2.10.0-py3.8-linux-x86_64.egg/h5py/_hl/group.py", line 601, in visititems
    return h5o.visit(self.id, proxy)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
SystemError: <built-in function visit> returned a result with an error set

The following command was used:

hsload -v --link s3://icesat2-sliderule/data/ATL03/ATL03_20190315152016_11790202_003_01.h5 /hsds/ATL03
jpswinski commented 4 years ago

See HSDS issue: https://github.com/HDFGroup/hsds/issues/71

jpswinski commented 3 years ago

This was a bug in hsload (a part of the h5pyd repo). It is fixed in version 0.8.2. The problem was when hsload was loading the files, if there was an http error, it was not retrying it, the result being that some file loads were corrupted.