fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.05k stars 362 forks source link

Fails when`ArrowFile` is used #1711

Open zsaladin opened 1 month ago

zsaladin commented 1 month ago
  1. Variable size is set
  2. But the size is bound method when ArrowFile is used
  3. Variable total is from the size

https://github.com/iterative/dvc/issues/10583

martindurant commented 1 month ago

Would you mind expanding the description a bit here? Are we talking about something like this:

--- a/fsspec/implementations/arrow.py
+++ b/fsspec/implementations/arrow.py
@@ -220,7 +220,6 @@ class ArrowFSWrapper(AbstractFileSystem):
         "readable",
         "writable",
         "close",
-        "size",
         "seekable",
     ],
 )
@@ -235,6 +234,10 @@ class ArrowFile(io.IOBase):
         self.blocksize = self.block_size = block_size
         self.kwargs = kwargs

+    @property
+    def size(self):
+        return self.stream.size
+
     def __enter__(self):

A test case showing the failure in pure fsspec code would be very helpful.

zsaladin commented 1 month ago

@martindurant

Right.

"size" shouldn't be method. I attached test case to show the error.

tqdm callback tries to add the method and a float literal. the method is <bound method NativeFile.size of <pyarrow.NativeFile closed=False own_file=False is_seekable=False is_writable=False is_readable=True>>

import fsspec
import pyarrow
import pyarrow.fs as pafs
from fsspec.callbacks import TqdmCallback

hdfs_host = "kt"

fs = fsspec.filesystem("hdfs", host=hdfs_host)

from_path = "/path/to/file"
to_path = "./tmp/test.tmp"

fs.get_file(from_path, to_path, callback=TqdmCallback())
Traceback (most recent call last):
  File "./home/user/repo/hello.py", line 13, in <module>
    fs.get_file(from_path, to_path, callback=TqdmCallback())
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/implementations/arrow.py", line 210, in get_file
    super().get_file(rpath, lpath, **kwargs)
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 904, in get_file
    callback.set_size(getattr(f1, "size", None))
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/callbacks.py", line 97, in set_size
    self.call()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/callbacks.py", line 311, in call
    self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1098, in __init__
    self.refresh(lock_args=self.lock_args)
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1347, in refresh
    self.display()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1495, in display
    self.sp(self.__str__() if msg is None else msg)
            ^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1151, in __str__
    return self.format_meter(**self.format_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 534, in format_meter
    if total and n >= (total + 0.5):  # allow float imprecision (#849)
                       ~~~~~~^~~~~
TypeError: unsupported operand type(s) for +: 'method' and 'float'
Exception ignored in: <function tqdm.__del__ at 0x7fffd8707380>
Traceback (most recent call last):
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1277, in close
    if self.last_print_t < self.start_t + self.delay:
       ^^^^^^^^^^^^^^^^^
AttributeError: 'tqdm' object has no attribute 'last_print_t'
martindurant commented 1 month ago

If my diff fixes the problem, would you like to put it in a PR with a test to show it does the right thing?

zsaladin commented 1 month ago

@martindurant

Sure

Could you provide instructions for writing tests that includes test code location, test code file name etc?

martindurant commented 1 month ago

fsspec.implementations.tests.test_arrow is the place you would do this. You can copy from any of the tests that include open().