PyFilesystem / s3fs

Amazon S3 filesystem for PyFilesystem2
http://fs-s3fs.readthedocs.io/en/latest/
MIT License
153 stars 55 forks source link

support use_ssl, verify arguments to boto3.client.__init__ #39

Open chairmank opened 6 years ago

chairmank commented 6 years ago

I would like S3FS to pass through a couple of configuration parameters to boto3.client.__init__ (https://boto3.readthedocs.io/en/latest/_modules/boto3/session.html#Session.client):

use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections. verify (boolean/string) -- Whether or not to verify SSL certificates. By default SSL certificates are verified.

The reason for this change is to support users of S3-compatible storage outside of AWS that require SSL configuration.

It seems like this will be a straightforward addition to the __init__ method and a corresponding change in the client property. I can open a pull request.

willmcgugan commented 6 years ago

Will look in to that. Would also accept a PR.

Btw, I had a problem with 'Minio', which claims to be S3-compatible, but had a number of differences. So I'd like to add a disclaimer that I can't guarantee that anything other than S3 will work properly.

amachanic commented 6 years ago

I just came here searching for this exact fix! Good timing. Any possibility of passing the use_ssl argument via an opener?

Edit: I think I meant "FS URL" rather than "opener." I just read the docs and see the ?key=value syntax. Is that how this would work?

And one more edit -- apologies for not testing before writing! I tried @chairmank's fix and can confirm that the arguments are passed to the boto constructor from the keys specified in my URL as asked above.

But there is an issue: When passing verify=False I'm hitting an exception inside of botocore: SSLError, No such file or directory. This is because "False" is being treated as a string, not a boolean, which tells botocore to treat it as a path, rather than stop verification altogether.

I fixed this by changing line 301 of _s3fs.py to:

self.verify = False if verify == "False" else verify

... but I'm not sure if there's a better way to handle this.

chairmank commented 6 years ago

@amachanic Did you pass

verify=False

directly to S3FS, or did you specify it as a query parameter in the URL that you passed to S3FSOpener? If you did the latter, then I think that you found a bug in my commit https://github.com/chairmank/s3fs/commit/cf91b3aeb82a0adbb54d290280aceea26c8d55ac . I'll fix it.

amachanic commented 6 years ago

@chairmank I passed it in the URL. Also just discovered that the same problem occurs with use_ssl in a URL; both need to be cast as boolean, as necessary, at some point along the chain.

Thanks for doing this fix!

chairmank commented 6 years ago

Actually, this is tricky because query strings are stringly-typed.

I think that verify=False in the URL must be interpreted as the string 'False', because it is impossible to distinguish between the boolean False and the filename 'False'.

Interpreting use_ssl in the query string requires a mapping from string to booleans. Do we recognize 'true' and 'True' to mean boolean True? What about 'yes' or '1'? I am not aware of a well-defined convention that we can follow. And what do we do if the URL contains'use_ssl=foobar'`?

Given these problems, I propose the following:

This feels kludgy. Suggestions for improvement?

willmcgugan commented 6 years ago

There's a boolean in there already called strict. The convention I've adopted is that 1 is the only value that is considered True. It at least avoids the confusion you've pointed out above.

Other booleans should probably adopt the same convention. Or we pick a better convention.

This might be a good time to hammer it out, because I'm going release 1.0.0 version soon.

An alternative would be to have the mere presence of a boolean value considered the True case, and its absense the False case.

Possibly there is no elegant solution here, and I'll just pick one and document it. But that would be fine to.

amachanic commented 6 years ago

@willmcgugan @chairmank

1 being True is a reasonable choice in my opinion, but it will not work with verify as there is only a False option or a string. So a 0 will have to be introduced. Seems simple and effective enough?

That said I do appreciate being able to pass the strings "true" and "false" for consistency, and in those cases it would be ideal to simply support any/all possible casing. The only downside for verify would be someone who wanted to use e.g. "TRUE" or "FALSE" as a file name, but that seems unlikely.

willmcgugan commented 6 years ago

With regards to verify, I think it might be prudent to separate that in to two parameters. One for the boolean and one for the path to the certs. I'm also not keen on the 'inherit defaults' option of None. We can probably pick our own defaults here.

chairmank commented 6 years ago

Okay, what about the following:

amachanic commented 6 years ago

Does use_ssl = None mean True inside of boto? If not the default needs to be True in order to not break existing implementations.

chairmank commented 6 years ago

Sorry, I confused the defaultsin my last comment. The defaults in boto, which I propose that S3FSOpener also honor, are

chairmank commented 6 years ago

I amended my PR. New commit is https://github.com/PyFilesystem/s3fs/pull/40/commits/e53676d15b011927eaa2decb2b703aa13b7dae5a

amachanic commented 6 years ago

Additional thought on this: If use_ssl is False, there is no point in verifying the certificate. So in that case, ignore verify altogether and default it to False as well. Convenience feature.