buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
595 stars 154 forks source link

Setting to limit the number of S3 requests the cache will perform #631

Closed diegohavenstein closed 1 year ago

diegohavenstein commented 1 year ago

When the bazel remote cache is used with an S3 bucket, it quickly becomes clear that it will send a ton of requests, which can incur in a substantial AWS bill. It would be really helpful if it could be limited somehow, i.e. if we could PUT/GET objects to S3, but only up to a certain number of requests/min. If this is not easy to implement, what are the alternatives?

Example: In our CI, with the bazel cache enabled, I see spikes to up to 20k S3 requests/second.

mostynb commented 1 year ago

There is a --num_uploaders flag which limits the number of concurrent PUT requests (after which, further PUT requests are quietly skipped) which might be useful, but it doesn't limit the number of concurrent GET or HEAD requests.

I wonder if limiting this would actually save costs though? I would have assumed that AWS CPU time to run an action is more expensive than the cost of the corresponding cache requests.

diegohavenstein commented 1 year ago

PUT requests are an order of magnitude more expensive than GET requests to S3. So running the EC2 instance with a large disk and the S3 bucket, but not uploading everything, could be the way to go

Price for 1000 GETs is $0.0004 Price for 1000 PUTs is $0.005

Good to know --num_uploaders will skip some PUT requests then. If our costs end up being too high due to PUTs I will give it a try