fsspec / filesystem_spec

A specification that python filesystems should adhere to.
BSD 3-Clause "New" or "Revised" License
1.03k stars 360 forks source link

Some implementations don't have same function signatures as base class #1100

Open leoleoasd opened 2 years ago

leoleoasd commented 2 years ago

Hi, I noticed some implementations change the default value of some parameters, for example: AbstractFilesystem have detail=True in ls, however LocalFileSystem have detail=False. (https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L301, https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/local.py#L56) This is inconsistent, for example, people may expect all filesystems to behave the same in ls operations. Also, as listdir just calls ls, if implementations changed the signature of ls, calling it with listdir will use the default value specified in AbstractFileSystem since it's defined there and implementations don't rewrite that. This behavior inconsistency already caused issues (for example, Lightning-AI/lightning#3805).

So I wrote a test to find out if there is any more inconsistency, and I found a lot. (https://github.com/leoleoasd/filesystem_spec/blob/master/fsspec/implementations/tests/test_common.py#L65, https://github.com/leoleoasd/filesystem_spec/actions/runs/3407906000/jobs/5667983849)

Is this a bug, should we fix it? The fix is definitely a breaking change and may cause some code to stop working. Maybe we should, first, find what implementation has different signatures, and warn users that use its default value about the change?

martindurant commented 2 years ago

I agree that the implementations should have the same defaults as each other, and that this default should be defined in the base class. There might be occasional cases where different defaults are appropriate for some given backend, but that should be rare and need justification.

I expect much of our test suite is explicit about the optional arguments like detail=, and this is why we don't see any problem. Likewise, users that access multiple backends are probably explicit.

So yes, we should fix it, but, as you say, this is a tricky thing to do without breaking downstream code. Listing the known instances in the description of this issue would be a great start.

leoleoasd commented 2 years ago

I'll make a list tomorrow(it's 12 pm in China), but if I remember it correctly, ls in multiple implementation have this problem. (Maybe local filesystem changed the default value first and other implementations started with copying code from local fs?)

leoleoasd commented 2 years ago

Methods that have a different default value:

test_signature[file-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, path1, path2, callback=None, **kwargs)
test_signature[file-put_file] - AssertionError: (self, lpath, rpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs) != (self, path1, path2, callback=None, **kwargs)
test_signature[file-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[memory-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[zip-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[tar-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[webhdfs-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False)
test_signature[github-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, sha=None, _sha=None, **kwargs)
test_signature[dbfs-makedirs] - AssertionError: (self, path, exist_ok=False) != (self, path, exist_ok=True)
test_signature[hdfs-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)

Method that missed some parameters in the base class:

test_signature[webhdfs-mkdir] - AssertionError: (self, path, create_parents=True, **kwargs) != (self, path, **kwargs)
test_signature[ftp-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
test_signature[hdfs-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
test_signature[reference-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs)
test_signature[ftp-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, rpath, lpath, **kwargs)
test_signature[reference-get] - AssertionError: (self, rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs) != (self, rpath, lpath, recursive=False, **kwargs)
test_signature[reference-open] - AssertionError: (self, path, mode='rb', block_size=None, cache_options=None, compression=None, **kwargs) != (self, path, mode='rb', block_size=None, cache_options=None, **kwargs)
test_signature[webhdfs-rm] - AssertionError: (self, path, recursive=False, maxdepth=None) != (self, path, recursive=False, **kwargs)
test_signature[webhdfs-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
leoleoasd commented 2 years ago

I think we should always keep the signature unchanged. Other implementations may add parameters, but they should not have fewer parameters or change their default value. If something isn't supported in some FS, the implementation should warn user that give value other than the default value.

martindurant commented 2 years ago

I agree; but it is plausible that at least some of those instances have correct signatures and defaults, but obfuscated in the kwargs. You are right that it would be better explicitly consistent.

Your code looks like a useful addition and you might consider adding it in a test marked xfail (wouldn't mind seeing how it looks). See also https://github.com/fsspec/filesystem_spec/pull/651

ianthomas23 commented 2 years ago

Just noting that this is related to issue #899.