awslabs / mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Apache License 2.0
4.56k stars 157 forks source link

Consider supporting `NO_PROXY` environment variable #322

Open krismarc opened 1 year ago

krismarc commented 1 year ago

Mountpoint for Amazon S3 version

mountpoint-s3 0.2.0-6888455

AWS Region

S3 compatible object-storage

Describe the running environment

locally

What happened?

Hi there,

I am trying to use this client with custom s3 compatible storage.

Our storage works perfectly fine with projects like: https://github.com/minio/mc https://github.com/s3fs-fuse/s3fs-fuse and even https://github.com/boto/boto3

any ideas how and if possible to use this project as well?

Best regards, K.M.

Relevant log output

./mount-s3 test ~/mnt --endpoint-url "https://ecs.appcloud.***.com"
Error: Failed to create S3 client

Caused by:
    0: HeadBucket failed for bucket test in region us-east-1
    1: Client error
    2: Unknown response error: MetaRequestResult { response_status: 0, crt_error: Error(14342, "aws-c-s3: AWS_ERROR_S3_SLOW_DOWN, Response code indicates throttling"), error_response_headers: None, error_response_body: None }
Error: Failed to create mount process
monthonk commented 1 year ago

Hi @krismarc, this is an error from the initial HeadBucket request to validate the connection to the bucket and discover the bucket region in case it’s not the default us-east-1. We are expecting the service to return either a success or IncorrectRegion, but looks like we got an unexpected error instead (AWS_ERROR_S3_SLOW_DOWN). It’s most likely an error from service side (503 http status code).

I would recommend three things:

krismarc commented 1 year ago

I've tried with proper region. If this would be storage issue it would neither work with different projects, no?

I'll try with extra logging tomorrow. However 503 rings me a bell. We are behind proxy. How can I configure it's address and port and noProxy list? I assume system variables might be not sufficient while other tools are fine with them.

monthonk commented 1 year ago

You should be able to specify the host and port together in form of host:port in the --endpoint-url config, but we don't have any configuration related to proxy right now.

jamesbornholt commented 1 year ago

While we don't actively test proxy support today, I expect the HTTPS_PROXY and HTTP_PROXY environment variables should work. I tried it like this:

export HTTPS_PROXY="http://localhost:8080"
mount-s3 giab ~/mnt --no-sign-request --endpoint-url https://s3.us-east-1.amazonaws.com

and was seeing the traffic on the mitmproxy I ran on port 8080. For HTTPS you'll need to make sure your proxy's certificate is trusted by the OS.

krismarc commented 1 year ago

Ok, my assumption is that it tries to go over internet proxy and this usually answers with 503 (in our case). Our storage is internal and won't be reachable over proxy. I'll do more tests tomorrow. Question is if it honors no_proxy/NO_PROXY. If you try to set 2 more env vars like:

export no_proxy=s3.us-east-1.amazonaws.com
export NO_PROXY=s3.us-east-1.amazonaws.com

It should actually fail or not reach your mitmproxy at all.

krismarc commented 1 year ago

Ok, I can confirm that NO_PROXY is not honored.

2023-06-30T06:26:42.550529Z DEBUG awscrt::http-connection: https_proxy environment found
2023-06-30T06:26:42.550535Z  INFO awscrt::http-connection: (STATIC) Connecting to "ecs.appcloud.***.com" through a tunnel via proxy "gate.zrh.***.com"
2023-06-30T06:26:43.128018Z DEBUG awscrt::http-stream: id=0x7f2cc0014900: Client request complete, response status: 503 (Service Unavailable).
2023-06-30T06:26:43.128027Z  INFO awscrt::http-connection: id=0x7f2cc0018c50: Shutting down connection with error code 0 (AWS_ERROR_SUCCESS).
2023-06-30T06:26:43.128032Z DEBUG awscrt::task-scheduler: id=0x7f2cc0003600: Scheduling channel_shutdown task for immediate execution
2023-06-30T06:26:43.128039Z DEBUG awscrt::S3MetaRequest: id=0x561f9de39ee0: Request 0x7f2cc0000f20 finished with error code 14342 (aws-c-s3: AWS_ERROR_S3_SLOW_DOWN, Response code indicates throttling) and response status 503

after unsetting all *proxy env variables the connection worked.

jamesbornholt commented 1 year ago

Great that you got it working! I think the outstanding issue here is to consider supporting NO_PROXY, which boto does, and therefore the AWS CLI does too. So I'll keep this issue open to track that.

indianwhocodes commented 7 months ago

Replacing your_custom_endpoint_url with the actual URL of your native object storage endpoint when i ran the benchmarking scripts is a handy configuration feature I could use, if we have benchmarks that aren't running necessarily against AWS S3, with my --endpoint_url=${S3_ENDPOINT_URL} pointing to my native object storage like manta, swift, minio etc.

The relevant documentation can be updated as well ref: mountpoint-s3/doc/BENCHMARKING.md. I have a working branch already in my forked repository that introduces this feature.

I can make a seperate issue tracking this but it seems this issue covers some of my asks.

Relevant Log Output

vagrant@vagrant:~/mountpoint-s3$ ./mountpoint-s3/scripts/fs_bench.sh
Will only run fio jobs which match small
Skipping job mountpoint-s3/scripts/fio/read/rand_read_4t_direct.fio because it does not match small

Error: Failed to create S3 client

Caused by:
    0: initial ListObjectsV2 failed for bucket my-bucket-1 in region us-east-1
    1: Client error
    2: Wrong region (expecting ap-southeast-1)
Error: Failed to create mount process
Failed to mount file system
read_benchmark:cleanup
plattenschieber commented 4 months ago

Same here, we use a proxy for some domains and enable S3 access directly via no_proxy and a service endpoint mounted in that vpc no_proxy=localhost,...,.s3.eu-central-1.amazonaws.com Using the aws cli too, with e.g. running aws s3 ls bucket-name works flawlessly, but I wanted to give the users some more comfort and mount their buckets into their filesystem with mount-s3.

Running mount-s3 runs into a timeout and I can see in the proxy logs, that a request occurs, even though the no_proxy is setup correctly.

+1 for respecting no_proxy

gmergulhao commented 1 month ago

+1 for this. exact same issue as @plattenschieber

muddyfish commented 1 month ago

Thanks all for the use cases for this feature. We don't have any more information to share right now on supporting NO_PROXY, but if you leave a 👍 on the main issue, it helps us to see what's needed most.