dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

when a non-hdf5 file is being opened a different error is being raised by h5py #695

Closed satra closed 2 years ago

satra commented 3 years ago

perhaps this is already being tackled in a PR. given that this is specific to 3.7 may be worth checking what version is available. the code suggests that every file is being opened using nwb?

however, given that the blob store doesn't have any file extensions, we do need some better evaluation of a file before opening it. we can use the extension if available, but fall back on some other options if not.

ERROR    dandi:upload.py:249 Failed to extract metadata from /tmp/pytest-of-runner/pytest-0/test_publish_and_manipulate0/upload/subdir/file.txt
Traceback (most recent call last):
  File "/home/runner/work/dandi-cli/dandi-cli/dandi/upload.py", line 246, in process_path
    path, digest=file_etag, digest_type="dandi_etag"
  File "/home/runner/work/dandi-cli/dandi-cli/dandi/metadata.py", line 673, in nwb2asset
    metadata = get_metadata(nwb_path)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/fscacher/cache.py", line 116, in fingerprinter
    ret = fingerprinted(path, *args, **kwargs_)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/joblib/memory.py", line 591, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/joblib/memory.py", line 761, in call
    output = self.func(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/fscacher/cache.py", line 84, in fingerprinted
    return f(path, *args, **kwargs)
  File "/home/runner/work/dandi-cli/dandi-cli/dandi/metadata.py", line 51, in get_metadata
    meta["nwb_version"] = get_nwb_version(path)
  File "/home/runner/work/dandi-cli/dandi-cli/dandi/pynwb_utils.py", line 95, in get_nwb_version
    with h5py.File(filepath, "r") as h5file:
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
jwodder commented 3 years ago

@satra These errors are caught and responded to by falling back to get_default_metadata(). Is attempting to open files as NWB by default actually a problem?

Also, I find it hard to believe that this is only happening on Python 3.7; I'm confident I've seen it on Python 3.9 as well.

satra commented 3 years ago

i just saw this in the most recent PR for the CI test #694

jwodder commented 3 years ago

@satra And the problem is...?

satra commented 3 years ago

py3.7 fails a test with the error i posted in this issue.

jwodder commented 3 years ago

@satra No, that error has nothing to do with the failure. The test failed because a to-be-published test Dandiset failed validation. (To be precise, what happened was that the server classified the Dandiset as "Invalid" with the message "1 assets have not been validated yet", which suggests a bug on the server's end.)

satra commented 3 years ago

@jwodder - indeed - i did not scroll up beyond the ERROR log.

regarding the failure, do you know if validation finished and then you tried to publish? posting an asset -> triggers validation -> which updates publishability. the question on the server's end is, is it validating? or is it stuck validating?

jwodder commented 3 years ago

@satra The ValueError is raised before actually trying to publish; the publish() method starts out by querying the version's /info/ endpoint until the status is either "Valid" or "Invalid", and in the latter case it raises a ValueError.

satra commented 3 years ago

right, but do you know if the validation process has actually completed? the validation endpoint should give more info. as in do you:

POST asset wait for validation to complete (on a test file this should happen quickly, but not instantaneously) check for validity then try to publish

satra commented 3 years ago

for example: https://api.dandiarchive.org/api/dandisets/000023/versions/draft/assets/1a93dc97-327d-4f9c-992d-c2149e7810ae/validation/

jwodder commented 3 years ago

@satra Check for validity how? The only way I know is to query /dandisets/{dandiset__pk}/versions/{version}/info/ and check the "status" field, which is what the code does. If you're referring to the /uploads/{upload_id}/validate/ endpoint, I don't believe that's bothered with asset metadata validation for a while.

jwodder commented 3 years ago

@dchiquito

satra commented 3 years ago

@jwodder - did you see the example above?

jwodder commented 3 years ago

@satra Yes, after I commented.

jwodder commented 2 years ago

@satra dandi-cli now only treats files with .nwb extension as NWB, and Dandiset metadata is checked for validity before publishing. Can this issue therefore be considered resolved?