fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
892 stars 274 forks source link

Custom s3 compatible https endpoint not working, port forwarded to localhost works #872

Closed ion-elgreco closed 7 months ago

ion-elgreco commented 7 months ago

I am using s3fs for the fsspec compatability with an s3-like object store. In this example, I am using LakeFS which is deployed on k8s. We have have it deployed with a hostname: https://

For some reason the connection with SSL doesn't work, even though I have the SSL certificates installed. When I set use_ssl=False to at least test the connection it doesn't work. I am getting no files found on everything. When I manually port forward the lakefs pod to my localhost, then connecting with s3fs works.

This is however quite strange because with the Rust object_store crate (which we use in Delta-RS) to read s3 object stores, it w orks with the https endpoint.

Any suggestions on how we could debug this?

This is what I am passing when it doesn't work:

fs = S3FileSystem(endpoint_url="https://<address>.com", key='<redacted>', secret='<redacted>', use_ssl=False)

ion-elgreco commented 7 months ago

I see what's going wrong here. By default it's looking at the wrong certificate bundle, while it should use this: /etc/ssl/certs/ca-certificates.crt, other clients like Hyper in Rust don't really have an issue with find this, so quite interesting

martindurant commented 7 months ago

Python SSL certificates are a thing. Apparently, you can set AWS_CA_BUNDLE or configure this in your .aws config somehow.

ion-elgreco commented 7 months ago

@martindurant what do you mean, are a thing? I don't use AWS so I won't have that config.

What I don't quite get, that this is only an issue with aiobotocore. everything else is able to find the certificate bundles without a problem

martindurant commented 7 months ago

The cert path defined within python is often not your system one. The rust crate will use the latter.

everything else is able to find the certificate bundles without a problem

What is "everything else" in this context? Libraries like requests usually allow you to explicitly supply extra bundles/paths at runtime.

ion-elgreco commented 7 months ago

The cert path defined within python is often not your system one. The rust crate will use the latter.

everything else is able to find the certificate bundles without a problem

What is "everything else" in this context? Libraries like requests usually allow you to explicitly supply extra bundles/paths at runtime.

I don't believe I've needed to manually provide the paths for 'requests' before

martindurant commented 7 months ago

I don't believe I've needed to manually provide the paths for 'requests' before

but have you used that with a custom endpoint like this before?

ion-elgreco commented 7 months ago

I don't believe I've needed to manually provide the paths for 'requests' before

but have you used that with a custom endpoint like this before?

I connected on multiple occasions to intranet pages that required the CA bundles to be manually installed in the system.

martindurant commented 7 months ago

ca_bundle and AWS_CA_BUNDLE are mentioned here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html . This suggests that boto(core) won't use your standard configuration without explicitly configuring. Have you tried setting these values?

ion-elgreco commented 7 months ago

@martindurant I got it working with: client_kwargs={"verify": CERT_PATH}, sorry for late update : P