fsspec / universal_pathlib

pathlib api extended to use fsspec backends
MIT License
240 stars 41 forks source link

is it possible to switch AWS_PROFILE/start a new Session completely for UPath pointing to S3? #201

Closed DayalStrub closed 7 months ago

DayalStrub commented 7 months ago

this is definitely more of a discussion than an issue, and related to s3fs i guess.

i sometimes need to use different profiles to access different buckets or prefixes. is it simple to "reset" UPath so it creates a new Session and picks up different profile? Am i just missing something due to a lack of understanding of fsspec/s3fs?

this is what i had in mind:


# %%
from upath import UPath
import os

# %%
os.environ["AWS_PROFILE"] = "shared1"
test_path = UPath("s3://data1/test.txt")
txt = test_path.read_text()

# %%
# DO SOMETHING?!

# %%
os.environ["AWS_PROFILE"] = "shared2"
test_path1 = UPath("s3://data2/test.txt")
test_path1.write_text(txt)

i can pass the session directly, but would rather "reset"/not have to, e.g. if i need to create multiple Upaths

# %%
from aiobotocore import session as sess
test_path1 = UPath("s3://data2/test.txt", session=sess.AioSession(profile='shared2'))

sorry for the silly question - i'll keep on digging in the mean time.

ap-- commented 7 months ago

Hi @DayalStrub

You should be able to achieve what you want, in a few different ways:

  1. aws profiles can be provided via storage options directly, see: https://s3fs.readthedocs.io/en/latest/#credentials
    p1 = UPath("s3://bucket/file", profile="shared1")
    p2 = UPath("s3://bucket/file", profile="shared2")
  2. filesystem instances are cached depending on your provided storage_options (the keyword arguments in UPath) see: https://github.com/fsspec/filesystem_spec/blob/d969b9601e4aaf5922cb00d82b869bf4b4affd95/fsspec/spec.py#L69-L78 This means that the filesystem instances are identical if (simplifying a bit) their storage_options are identical UPath("s3://bucket/something").fs is UPath("s3://bucket/bla").fs. You can prevent that caching from occuring by using a special keyword argument:
    UPath("s3://bucket/file", skip_instance_cache=True)
  3. You can clear clear the instance cache of a specific filesystem class manually by running:

    import fsspec
    
    os.environ["AWS_PROFILE"] = "shared1"
    p1 = UPath("s3://bucket/foo")
    
    # clear the instance cache for the fsspec class associated to the "s3" protocol
    fsspec.get_filesystem_class(p1.protocol).clear_instance_cache()
    
    os.environ["AWS_PROFILE"] = "shared2"
    p2 = UPath("s3://bucket/bar")

Let me know if that helps, Andreas

DayalStrub commented 7 months ago

super helpful! thanks!

for what i had in mind, where i might use one profile for a while/some paths then switch, i think option 3 works great, and i would maybe make a helper function and pass the protocol directly, eg something like

def set_profile(profile):
    fsspec.get_filesystem_class("s3").clear_instance_cache()
    os.environ["AWS_PROFILE"] = profile

set_profile("shared1")

p1 = UPath("s3://bucket/foo")
p2 = UPath("s3://bucket/bar")

set_profile("shared2")

p3 = UPath("s3://bucket/baz")

i'll close the issue - but do think it would make a great Discussion, given the great answer. guess people can search old tickets too.