fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
210 stars 36 forks source link

s3fs kwarg unexpected keyword argument in `AioSession` #209

Open bjhardcastle opened 3 months ago

bjhardcastle commented 3 months ago

Might be related to #204

I'm trying to use the cache_type kwarg for s3 [source], but this causes issues down the line when the file is accessed:

>>> import upath
>>> url =  "s3://codeocean-s3datasetsbucket-1u41qdg42ur9/39490bff-87c9-4ef2-b408-36334e748ac6/nwb/ecephys_620264_2022-08-02_15-39-59_experiment1_recording1.nwb"

>>> path = upath.UPath(url, cache_type="first")
>>> path
S3Path('s3://codeocean-s3datasetsbucket-1u41qdg42ur9/39490bff-87c9-4ef2-b408-36334e748ac6/nwb/ecephys_620264_2022-08-02_15-39-59_experiment1_recording1.nwb')

>>> path.exists()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\upath\core.py", line 711, in exists
    return self.fs.exists(self.path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\fsspec\asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\fsspec\asyn.py", line 103, in sync
    raise return_result
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\fsspec\asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\s3fs\core.py", line 1035, in _exists
    await self._info(path, bucket, key, version_id=version_id)
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\s3fs\core.py", line 1302, in _info
    out = await self._call_s3(
          ^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\s3fs\core.py", line 341, in _call_s3
    await self.set_session()
  File "c:\Users\ben.hardcastle\github\npc_io\.venv\Lib\site-packages\s3fs\core.py", line 502, in set_session
    self.session = aiobotocore.session.AioSession(**self.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AioSession.__init__() got an unexpected keyword argument 'cache_type'

upath: 0.2.2 python: 3.11.5

ap-- commented 3 months ago

Hi @bjhardcastle

Please note the difference between storage options for AbstractFileSystems and options for the their open() methods:

S3FileSystem class https://github.com/fsspec/s3fs/blob/efbe1e4c23a06e65b3df6a82f28fc49bab0dbd78/s3fs/core.py#L273-L297

The UPath constructor gathers all keyword arguments under **storage_options and uses those to instantiate the specific filesystem class.

S3FileSystem._open() method https://github.com/fsspec/s3fs/blob/efbe1e4c23a06e65b3df6a82f28fc49bab0dbd78/s3fs/core.py#L611-L625

If you want to pass specific options down to the filesystem specific AbstractBufferedFile implementation, you would use the following in your case:

import upath
upath.UPath("s3://mybucket/myfile.txt").open(cache_type="first")

If you want to set this on the Filesystem level for s3fs you can do:

import upath
p = upath.UPath("s3://mybucket/myfile.txt", default_cache_type="first")
...
p.open()  # will use the default_cache_type 

Let me know if that helps! It would be wonderful, if you could tell me how I could improve the text in the README to make this more intuitive. PRs are super welcome too!

Cheers, Andreas :smiley:

bjhardcastle commented 3 months ago

Hi Andreas,

Thank you very much for explaining in detail. That of course fixed it!

I don't think it was a problem with the README in this case, but the wording for the open() method (which I assumed came from pathlib): image Because it says "as the built-in does", I never would have thought to pass it config for the fsspec-related operations.

One of the reasons I use upath is so I don't need to set-up anything manually, it just handles whatever I throw at it! Now I'm trying to use different configurations I'll refer to the documentation more and let you know if any parts aren't clear.

Cheers, ben