PyFilesystem / pyfilesystem

Python filesystem abstraction layer
http://pyfilesystem.org/
BSD 3-Clause "New" or "Revised" License
288 stars 63 forks source link

Support extra parameters for s3fs / Boto / S3 Connection #190

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
boto.s3.connection.Connection() supports bunch of extra options, including:
* anon=True for read-only connection to public S3 shares
* proxy, proxy_port etc for HTTP(S) proxy options
* custom hosts, connection classes and whatnot

The attached patch simplifies fs.s3fs code to pass in all extra keyword 
arguments to boto.s3.connection.Connection(). It also allows connecting to 
public S3 shares with "anonymous" account (ie. no AWS credentials required).

This might need a bit more: tests, doc fixes etc, but the basic idea works and 
is already in use in our internal project (using listdir, listdirinfo, open, 
...).

Note: it _might_ be possible to also pass in all these options through a Boto 
config file, but that is not feasible in our use-cases (many dynamic 
end-points).

Original issue reported on code.google.com by j...@kruu.org on 28 Oct 2014 at 2:12

Attachments:

GoogleCodeExporter commented 9 years ago
Oops, typo in title s/s4/s3/ :)

Original comment by j...@kruu.org on 28 Oct 2014 at 2:13

GoogleCodeExporter commented 9 years ago
My quick 2 cents... (take with a pinch of salt as I've never used s3fs or Boto)

By changing the prototype of the __init__ function, you've instantly broken 
backwards-compatibility for people who were passing in positional arguments 
rather than keyword arguments :-/

IMHO assuming *all* the kwargs are supposed to be passed to Boto feels a little 
'greedy' and could cause problems with future extensions to the pyfs interface?
I wonder if you could add a prefix to all the boto-specific kwargs, and then 
use this to filter out only the kwargs you need? (obviously removing the prefix 
before passing them on to boto). Kinda in the same spirit as 
http://docs.pyfilesystem.org/en/latest/implementersguide.html#meta-values  ?

Original comment by gc...@loowis.durge.org on 28 Oct 2014 at 2:45

GoogleCodeExporter commented 9 years ago
Yeah, changing the signature is a no go. I would suggest adding a 'boto_args' 
keyword argument which should be a dictionary of these extra arguments. 
Something like this:

    def __init__(self, bucket, prefix="", aws_access_key=None, aws_secret_key=None, separator="/", thread_synchronize=True, key_sync_timeout=1, boto_args=None):

That should keep it backwards compatible.

There is also the possibility of alternative constructors. There could be a 
S3FS.create_anonymous for example.

Original comment by willmcgugan on 28 Oct 2014 at 3:03

GoogleCodeExporter commented 9 years ago
Also, one way would be to make it easier to override connection creation method 
and maybe some other parts too. Now sub-classing the S3FS is a bit too hard.

Hmm, I first checked that boto/s3 connection keyword arguments for access key 
and secret key were the same, but they actually are different names.

Anyway, this wasn't meant to be a full patch (I bet it also breaks unit tests), 
just submitted what now is good enough for me.

Right now boto.s3.connection.Connection __init__ looks like this:

 def __init__(self, aws_access_key_id=None, aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None,
host=NoHostProvided, debug=0, https_connection_factory=None,
calling_format=DefaultCallingFormat, path='/',
provider='aws', bucket_class=Bucket, security_token=None,
suppress_consec_slashes=True, anon=False,
validate_certs=None, profile_name=None):

https://github.com/boto/boto/blob/develop/boto/s3/connection.py#L167

Original comment by j...@kruu.org on 28 Oct 2014 at 5:46

GoogleCodeExporter commented 9 years ago
And here's a link to Boto config tutorial:
http://boto.readthedocs.org/en/latest/boto_config_tut.html

Maybe s3fs opener should support some of this stuff? At least passing in 
"profile name", which would make it possible to add global or per-user boto 
config file and have the settings there. So, profile name could be one part of 
s3fs URL (username, parameter, anchor...?)

Original comment by j...@kruu.org on 28 Oct 2014 at 5:55

GoogleCodeExporter commented 9 years ago
Ended up creating AnonS3FS sub-class, looking something like this:

class AnonS3FS(S3FS):
    def __init__(self, *args, **kwargs):
        self.boto_kwargs = {'anon': True}
        # need to fake these to be non-None, because S3FS checks them
        kwargs['aws_access_key'] = 'bogus'
        kwargs['aws_secret_key'] = 'bogus'
        super(AnonS3FS, self).__init__(*args, **kwargs)

    # copied from S3FS, changed to use **self.boto_kwargs, not *self._access_keys
    def _s3conn(self):
        try:
            (c, ctime) = self._tlocal.s3conn
            if time.time() - ctime > 60:
                raise AttributeError
            return c
        except AttributeError:
            c = boto.s3.connection.S3Connection(**self.boto_kwargs)
            self._tlocal.s3conn = (c, time.time())
            return c
    _s3conn = property(_s3conn)

+ copied opener code to handle our local "anons3fs://" URLs. Works for me, and 
this way it's also possible to set any extra options.

For the actual use-case, I ended up moving some variables to Boto config file 
(required anyway for setting HTTP timeout to something sane).

I'm not 100% happy with this solution either...

Original comment by j...@kruu.org on 6 Nov 2014 at 10:05