buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
607 stars 158 forks source link

Large number of TLS handshakes to the S3 proxy #675

Open gabrielrussoc opened 1 year ago

gabrielrussoc commented 1 year ago

Hi all,

I noticed a lot of CPU usage / general slowness coming out of the Bazel Remote and after some digging I was able to pin it down to a very large number of TLS handshakes to our S3 bucket. The problem goes away if I set the --s3.disable_ssl flag. The metrics show a drop of FindMissingBlob requests from 4s to 400ms on a p90 level using the same physical resources.

It turns out this issue is not specific to Bazel Remote but rather to minio (the client used to talk to s3). I opened an issue there with reproduction details: https://github.com/minio/minio-go/issues/1855. Unfortunately, the issue might be even lower and actually be on the Go http library itself: https://github.com/golang/go/issues/50984.

I'm exploring whether disabling SSL is feasible for our environment, but it makes the Bazel Remote basically unusable for our volume (we're trying it with a peak of 100k requests / minute, but the real load is much higher).

I'm using Bazel Remote v2.4.1 on kubernetes using Docker as a runtime.

mostynb commented 1 year ago

Thanks for the detailed bug report.

Reading through the linked issues, it sounds like we're stuck waiting for a fix in a future version of go. Except perhaps if there is another go s3 client that doesn't use net/http.

I'm exploring whether disabling SSL is feasible for our environment, but it makes the Bazel Remote basically unusable for our volume (we're trying it with a peak of 100k requests / minute, but the real load is much higher).

You might be able to try use this with a TLS termination proxy, to try and offload the handshakes.