delta-io / delta-sharing

An open protocol for secure data sharing
https://delta.io/sharing
Apache License 2.0
757 stars 171 forks source link

Getting Caused by: UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 while reading with Proxy Configuration #320

Open pavan-kumar-chalamcharla opened 1 year ago

pavan-kumar-chalamcharla commented 1 year ago

When using the proxy configuration such as http_proxy and https_proxy environment variables when using the open delt-sharing for reading the data the proxy environment variables are not picked up causing the below error when using the bucket policy which allows only proxy IPs:

FileReadException: Error while reading file delta-sharing:/dbfs%253A%252FFileStore%252F059501f2aeb8fad0607470b70008727a/62477. Caused by: UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 Forbidden <?xml version="1.0" encoding="UTF-8"?>
**DEBUG:fsspec.http**:Cannot connect to host [<bucket name>.s3.us-east-1.amazonaws.com](https://<bucketname>.s3.us-east-1.amazonaws.com/):443 ssl:default [Connect call failed ('', 443)]

when debugging it further we see that the fsspec.http/aiohttp is used when reading the pre-signed URLs and those libraries are not using the HTTP_PROXY env variables that are set and causing the failure while reading the data.

We are looking for support of proxy with open delta sharing while reading data via delta-sharing python libraries.

linzhou-db commented 1 year ago

Do they already have an idea of how to fix the issue? If so, feel free to send out a PR, as this is oss code.

quertenmont commented 1 year ago

facing the same issue....

quertenmont commented 1 year ago

After your fix #326 , I can go one step further with my proxy configuration, but I am still having troubles

1) if I use a https_proxy, I get the following error emitted from aiohttp HTTPS proxies https://mitmproxy:8080/ are not supported, ignoring

2) If I replace my proxy configuration to use http instead of https, then it's my proxy server that complain, because the TLS handshake is failling

[09:08:48.665][10.244.13.19:51180] client connect
[09:08:48.860][10.244.13.19:51180] server connect open-delta-sharing.s3.us-west-2.amazonaws.com:443 (52.218.196.225:443)
[09:08:49.207][10.244.13.19:51180] Client TLS handshake failed. The client does not trust the proxy's certificate for open-delta-sharing.s3.us-west-2.amazonaws.com (tlsv1 alert unknown ca)
[09:08:49.208][10.244.13.19:51180] client disconnect

Any idea how I can sort this out ? Thanks in advance Loic

pavan-kumar-chalamcharla commented 1 year ago

looks like a limitation from the aiohttp. The below link from aiohttp mentions that it supports "HTTP proxies and HTTP proxies that can be upgraded to HTTPS via the HTTP CONNECT method". https://github.com/aio-libs/aiohttp/blob/master/docs/client_advanced.rst#proxy-support

check if the workaround mentioned below comment works for you and make sure you use the aiohttp v3.8: https://github.com/aio-libs/aiohttp/discussions/6044#discussioncomment-1432443

quertenmont commented 1 year ago

setattr(asyncio.sslproto._SSLProtocolTransport, "_start_tls_compatible", True)

does not make any difference for me.

But, I was able to make a connection via: ALGO-->http-->MITM-PROXY-->HTTPS-->DELTASHARE-DATA if the deltashare-data host is included in the --ignore-hosts argument of mitm proxy See here for the doc: https://docs.mitmproxy.org/dev/howto-ignoredomains/

Not ideal... but better than nothing. Would be nice to have https from end to end.