Closed kylebarron closed 3 weeks ago
Here, you are using the file-like API. I can make a thin wrapper which calls cat_file with a range for any input filesystem, and then this should work.
I figured that I only needed as a minimum to implement the methods on the AsyncFileSystem
base class that raised NotImplementedError
, and that everything else would be supported out of the box?
Yeah, a fallback implementation in fsspec would totally make sense. This does the business
--- a/fsspec/spec.py
+++ b/fsspec/spec.py
@@ -1915,7 +1915,7 @@ class AbstractBufferedFile(io.IOBase):
def _fetch_range(self, start, end):
"""Get the specified set of bytes from remote"""
- raise NotImplementedError
+ return self.fs.cat_file(self.path, start=start, end=end)
def read(self, length=-1):
together with
--- a/object-store-rs/python/object_store_rs/fsspec.py
+++ b/object-store-rs/python/object_store_rs/fsspec.py
@@ -55,7 +55,7 @@ class AsyncFsspecStore(fsspec.asyn.AsyncFileSystem):
resp = await obs.get_async(self.store, path)
return await resp.bytes_async()
- if start and end:
+ if start is not None and end is not None:
return await obs.get_range_async(
self.store, path, offset=start, length=end - start
)
(because start can be and often is 0, and actually None means 0 anyway).
Leads to:
In [1]: import object_store_rs as obs
...: import pyarrow.parquet as pq
...: from object_store_rs.fsspec import AsyncFsspecStore
...:
...: store = obs.store.HTTPStore.from_url("https://github.com")
...: fs = AsyncFsspecStore(store)
...: url = "opengeospatial/geoparquet/raw/refs/heads/main/examples/example.parquet"
...: pq.read_metadata(url, filesystem=fs)
Out[1]:
<pyarrow._parquet.FileMetaData object at 0x10a8539c0>
created_by: parquet-cpp-arrow version 16.1.0
num_columns: 10
num_rows: 5
num_row_groups: 1
format_version: 2.6
serialized_size: 6685
Until https://github.com/fsspec/filesystem_spec/pull/1732 is merged and released, is it expected for AsyncFsspecStore to subclass both from fsspec.asyn.AsyncFileSystem
and AbstractBufferedFile
?
is it expected for AsyncFsspecStore to subclass both from fsspec.asyn.AsyncFileSystem and AbstractBufferedFile?
That sounds ... painful. The linked PR has no negative side effects, so I only need to write some kind of test for it.
@martindurant Do you have any interest in pushing this PR over the line, since you know fsspec so well?
Sure! What needs doing besides the fsspec PR?
I'm not fully sure how to test it. I'm also not sure if there are any provided methods in the fsspec base class that we should be implementing, that could be more efficiently provided from the Rust side (without adding a bunch of new Rust code; I'd like this fsspec wrapper to ideally be python-only)
we would love to contribute to this as well and make it work with flyteorg/flyte - wdyt?
I'm happy to accept contributions
@kylebarron, When will this be merged? I'd like to help work on it as well.
I'd be happy to accept a PR with tests. There was some progress from @martindurant in #60 which was unblocked by adding test setup in https://github.com/developmentseed/obstore/pull/62
Superseded by https://github.com/developmentseed/obstore/pull/63
Wrapper around the existing object store API we present to Python to be used
E.g. something like this should work, but doesn't quite yet:
cc @martindurant