fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
995 stars 351 forks source link

fsspec.fuse with zstd file crashes when trying to read from a file #1590

Open mxmlnkn opened 4 months ago

mxmlnkn commented 4 months ago

I have a script called fsspec:

#!/usr/bin/env python3
import sys
if '-f' in sys.argv:
    del sys.argv[sys.argv.index("-f")]
from fsspec.implementations.tar import TarFileSystem as tafs
fs = tafs(sys.argv[1])
print(f"Mount {sys.argv[1]} at {sys.argv[2]}")
import fsspec.fuse
fsspec.fuse.run(fs, "./", sys.argv[2])

I have created test files and test them like this:

echo bar > foo
tar -cf foo.tar ./foo
mkdir mounted
for c in bzip2 gzip zstd xz; do
    $c -f -k foo.tar
done
for c in bz2 gz xz zst; do
    echo "== Testing with foo.tar.$c =="
    sleep 1s
    ./fsspec foo.tar.$c mounted &
    sleep 0.5s
    cat mounted/foo
    fusermount -u mounted
done

Output:

== Testing with foo.tar.bz2 ==
bar
== Testing with foo.tar.gz ==
bar
== Testing with foo.tar.xz ==
bar
== Testing with foo.tar.zst ==
Uncaught critical exception from FUSE operation read, aborting.
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 734, in _wrapper
    return func(*args, **kwargs) or 0
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 844, in read
    ret = self.operations('read', self._decode_optional_path(path), size,
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 1075, in __call__
    return getattr(self, op)(*args)
  File "/home/user/.local/lib/python3.10/site-packages/fsspec/fuse.py", line 78, in read
    f.seek(offset)
io.UnsupportedOperation: File or stream is not seekable.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 737, in _wrapper
    if e.errno > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
cat: mounted/foo: Software caused connection abort
cat: mounted/foo: Transport endpoint is not connected

For some reason, the file object created with the FUSEr.open call does not seem to be seekable when used in FUSEr.read.

Note that the seek in this case isn't even necessary because it tries to seek to offset 0, where it already is. So adding if f.tell() != offset: check before the seek should fix it for my case, but then I get a similar error from another point:

Uncaught critical exception from FUSE operation read, aborting.
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 734, in _wrapper
    return func(*args, **kwargs) or 0
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 844, in read
    ret = self.operations('read', self._decode_optional_path(path), size,
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 1075, in __call__
    return getattr(self, op)(*args)
  File "/home/user/.local/lib/python3.10/site-packages/fsspec/fuse.py", line 81, in read
    out = f.read(size)
  File "/usr/lib/python3.10/tarfile.py", line 700, in readinto
    buf = self.read(len(b))
  File "/usr/lib/python3.10/tarfile.py", line 688, in read
    self.fileobj.seek(offset + (self.position - start))
OSError: cannot seek zstd decompression stream backwards

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 737, in _wrapper
    if e.errno > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'

Note that the other exception from site-packges/fuse.py is unrelated. It is what I get after monkey-patching a long-standing bug.

I formulated this issue as related to fsspec.fuse, but it seems to me that this is a more general problem and would also appear when ussing fsspec as a library.

I'm surprised that gzip and bzip2 do work because they should have had the exact same issue with non-seekability.

I am using Python 3.10.12.

Skylion007 commented 4 months ago

@mxmlnkn Mind submitting the monkeypatch as a PR?

mxmlnkn commented 4 months ago

I think there has been some confusion. The patch is for fusepy, not fsspec. A corresponding open PR already exists here.