apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.32k stars 682 forks source link

Object Store HTTP Rate Limiting #5396

Open Kinrany opened 4 months ago

Kinrany commented 4 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. I'd like to limit the rate of requests I make to S3, to avoid running into throttling on their side.

Describe the solution you'd like There could be a wrapper, similar to ThrottledStore, that puts requests in a queue and executes them up to specified throughput.

Describe alternatives you've considered Rate limiting could be baked into ObjectStore implementations that need it, with rules and configuration that apply to them only. This might be better if there is a wide variety of rate limiting rules between different vendors. On the other hand, this is worse if, say, S3-compatible vendors have different rules between them.

tustvold commented 4 months ago

I believe this should be a simple case of exposing the appropriate reqwest::Client options on ClientOptions.

Edit: it looks like reqwest doesn't actually expose this, it should be a relatively straightforward addition though to modify RetryExt to allocate a sempahore permit.

hesampakdaman commented 1 month ago

@tustvold I'll gladly take a stab at this. Am I correct in understanding that we would like to limit the number of requests in send_retry?

modified   object_store/src/client/retry.rs
@@ -368,7 +368,7 @@ impl RetryExt for reqwest::RequestBuilder {
         }
     }

-    fn send_retry(self, config: &RetryConfig) -> BoxFuture<'static, Result<Response>> {
+    fn send_retry(self, config: &RetryConfig, semaphore: Option<Arc<Semaphore>>) -> BoxFuture<'static, Result<Response>> {
tustvold commented 1 month ago

I would recommend creating something like

struct RequestContext {
    config: RetryConfig,
    semaphore: Arc<Semaphore>
}

To encapsulate the state. This could then be stored instead of RetryConfig on the clients such as S3Client and passed by reference to send_retry instead of RetryConfig.

hesampakdaman commented 1 month ago

Alright, I made an attempt! I am a bit worried to have introduced breaking changes due to requiring RequestContext in all new methods of the clients (and their builders). Potentially, we could modify this so that the builder receives RetryConfig and uses it to create a RequestContext with other fields being default.