fsspec / ipfsspec

readonly python fsspec implementation for IPFS
MIT License
21 stars 10 forks source link

Raise FileNotFoundError with cat_file on non-existent file #11

Closed thewtex closed 2 years ago

thewtex commented 2 years ago

zarr / fsspec tries to load a .zarray when opening to discover whether a group has a zarr array. fsspec expects a FileNotFoundError if it does not exist when attempting to cat this file. Raise the HTTP 404 as a FileNotFoundError.

This addresses the following backtrace:

test/test_spatial_image_multiscale.py:43: in verify_against_baseline
    dt = open_datatree(store, engine="zarr", mode="r")
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/datatree/io.py:66: in open_datatree
    return _open_datatree_zarr(filename_or_obj, **kwargs)
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/datatree/io.py:87: in _open_datatree_zarr
    with zarr.open_group(store, mode="r") as zds:
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/zarr/hierarchy.py:1204: in open_group
    return Group(store, read_only=read_only, cache_attrs=cache_attrs,
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/zarr/hierarchy.py:126: in __init__
    if contains_array(store, path=self._path):
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/zarr/storage.py:96: in contains_array
    return key in store
../../bin/mambaforge/envs/spatial-image/lib/python3.9/_collections_abc.py:684: in __contains__
    self[key]
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/zarr/storage.py:545: in __getitem__
    return self._mutable_mapping[key]
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/fsspec/mapping.py:135: in __getitem__
    result = self.fs.cat(k)
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/fsspec/spec.py:739: in cat
    return self.cat_file(paths[0], **kwargs)
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/ipfsspec/core.py:183: in cat_file
    data = self._gw_get(path)
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/ipfsspec/core.py:161: in _gw_get
    return self._run_on_any_gateway(lambda gw: gw.get(path))
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/ipfsspec/core.py:155: in _run_on_any_gateway
    res = f(gw)
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/ipfsspec/core.py:161: in <lambda>
    return self._run_on_any_gateway(lambda gw: gw.get(path))
../../bin/mambaforge/envs/spatial-image/lib/python3.9/site-packages/ipfsspec/core.py:59: in get
    res.raise_for_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Response [404]>
d70-t commented 2 years ago

Thanks @thewtex for the PR 👍 ipfsspec should indeed raise FileNotFoundError in this case. Could you add a check if the response code was 404 and re-raise the HTTPError if not? Other codes would likely indicate some problems with the server or the connection, which we'd probably like to handle differently.

Please note that the async branch has been merged, so normally, opening an ipfs:// url should now use the async implementation by default. I'm wondering why it didn't in your case...

thewtex commented 2 years ago

Hi @d70-t !

ipfsspec should indeed raise FileNotFoundError in this case. Could you add a check if the response code was 404 and re-raise the HTTPError if not? Other codes would likely indicate some problems with the server or the connection, which we'd probably like to handle differently.

Yes, good point -- done.

Please note that the async branch has been merged, so normally, opening an ipfs:// url should now use the async implementation by default. I'm wondering why it didn't in your case..

I would love to use the async support, but it seems to hang? To reproduce:

git clone -b ipfsspec https://github.com/thewtex/spatial-image-multiscale
cd spatial-image-multiscale
pip install -e '.[test]'
pytest