Open leoleoasd opened 2 years ago
I agree that the implementations should have the same defaults as each other, and that this default should be defined in the base class. There might be occasional cases where different defaults are appropriate for some given backend, but that should be rare and need justification.
I expect much of our test suite is explicit about the optional arguments like detail=
, and this is why we don't see any problem. Likewise, users that access multiple backends are probably explicit.
So yes, we should fix it, but, as you say, this is a tricky thing to do without breaking downstream code. Listing the known instances in the description of this issue would be a great start.
I'll make a list tomorrow(it's 12 pm in China), but if I remember it correctly, ls
in multiple implementation have this problem. (Maybe local filesystem changed the default value first and other implementations started with copying code from local fs?)
Methods that have a different default value:
test_signature[file-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, path1, path2, callback=None, **kwargs)
test_signature[file-put_file] - AssertionError: (self, lpath, rpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs) != (self, path1, path2, callback=None, **kwargs)
test_signature[file-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[memory-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[zip-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[tar-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
test_signature[webhdfs-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False)
test_signature[github-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, sha=None, _sha=None, **kwargs)
test_signature[dbfs-makedirs] - AssertionError: (self, path, exist_ok=False) != (self, path, exist_ok=True)
test_signature[hdfs-ls] - AssertionError: (self, path, detail=True, **kwargs) != (self, path, detail=False, **kwargs)
Method that missed some parameters in the base class:
test_signature[webhdfs-mkdir] - AssertionError: (self, path, create_parents=True, **kwargs) != (self, path, **kwargs)
test_signature[ftp-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
test_signature[hdfs-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
test_signature[reference-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs)
test_signature[ftp-get_file] - AssertionError: (self, rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, outfile=None, **kwargs) != (self, rpath, lpath, **kwargs)
test_signature[reference-get] - AssertionError: (self, rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object at 0x7fd0a8033eb0>, **kwargs) != (self, rpath, lpath, recursive=False, **kwargs)
test_signature[reference-open] - AssertionError: (self, path, mode='rb', block_size=None, cache_options=None, compression=None, **kwargs) != (self, path, mode='rb', block_size=None, cache_options=None, **kwargs)
test_signature[webhdfs-rm] - AssertionError: (self, path, recursive=False, maxdepth=None) != (self, path, recursive=False, **kwargs)
test_signature[webhdfs-mv] - AssertionError: (self, path1, path2, recursive=False, maxdepth=None, **kwargs) != (self, path1, path2, **kwargs)
I think we should always keep the signature unchanged. Other implementations may add parameters, but they should not have fewer parameters or change their default value. If something isn't supported in some FS, the implementation should warn user that give value other than the default value.
I agree; but it is plausible that at least some of those instances have correct signatures and defaults, but obfuscated in the kwargs. You are right that it would be better explicitly consistent.
Your code looks like a useful addition and you might consider adding it in a test marked xfail (wouldn't mind seeing how it looks). See also https://github.com/fsspec/filesystem_spec/pull/651
Just noting that this is related to issue #899.
Hi, I noticed some implementations change the default value of some parameters, for example:
AbstractFilesystem
havedetail=True
inls
, howeverLocalFileSystem
havedetail=False
. (https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L301, https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/local.py#L56) This is inconsistent, for example, people may expect all filesystems to behave the same inls
operations. Also, aslistdir
just callsls
, if implementations changed the signature ofls
, calling it withlistdir
will use the default value specified inAbstractFileSystem
since it's defined there and implementations don't rewrite that. This behavior inconsistency already caused issues (for example, Lightning-AI/lightning#3805).So I wrote a test to find out if there is any more inconsistency, and I found a lot. (https://github.com/leoleoasd/filesystem_spec/blob/master/fsspec/implementations/tests/test_common.py#L65, https://github.com/leoleoasd/filesystem_spec/actions/runs/3407906000/jobs/5667983849)
Is this a bug, should we fix it? The fix is definitely a breaking change and may cause some code to stop working. Maybe we should, first, find what implementation has different signatures, and warn users that use its default value about the change?