Closed funky-eyes closed 4 months ago
I want to do speed limit in S3MultiPartOutputStream#write, I wonder if the community can agree to this solution, I can mention pr to support it
It seems you will need KIP-956, which hasn't been accepted yet.
@ivanyu Yes, I need this feature, but I found that I can actually achieve rate limiting in the implementation code of copysegment. So why don't we implement this feature ourselves? Just like how the community provides compression for copysegment.
Yeah, I see your point. We're looking into your PRs and other ways to implement this
@ivanyu https://github.com/apache/kafka/pull/15625 I've noticed this standard RLMQuotaManager, but it seems that we still need to adapt and use it because it appears to be just a standard definition.
@funky-eyes I have been looking into how quotas are implemented and it's not going to help us that much on the upload path as the granularity is at the segment level. It will not help on how fast a segment is actually read and uploaded. For the fetch part it will be more effective I think.
I've looked into some native support on the SDKs for this, and apart from some readLimit config from AWS SDK v1, I haven't found anything similar in V2 or other vendor SDKs.
I've took a look at #521 and I like the idea of using bucket4j (first time I've seen it; thanks for bringing it to our attention). However I'd like to see if we can have the rate limiting a bit higher in the stack so it can benefit other SDKs, not only S3. I have started a PR to evaluate this approach: https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/pull/548 -- would be great if you can have a look and check if brings similar benefits as your PR. Thanks!
@funky-eyes I have been looking into how quotas are implemented and it's not going to help us that much on the upload path as the granularity is at the segment level. It will not help on how fast a segment is actually read and uploaded. For the fetch part it will be more effective I think.
I've looked into some native support on the SDKs for this, and apart from some readLimit config from AWS SDK v1, I haven't found anything similar in V2 or other vendor SDKs.
I've took a look at #521 and I like the idea of using bucket4j (first time I've seen it; thanks for bringing it to our attention). However I'd like to see if we can have the rate limiting a bit higher in the stack so it can benefit other SDKs, not only S3. I have started a PR to evaluate this approach: #548 -- would be great if you can have a look and check if brings similar benefits as your PR. Thanks!
I apologize for only seeing your message now. First of all, thank you for reviewing my requirement and PR. I'm also pleased to see that you used bucket4j for rate limiting in #548. I took a preliminary look at your PR, and you've wrapped the InputStream with RateLimitedInputStream, controlling the read rate to solve the problem, while I controlled the write rate to address the issue. However, both have similar effects, and I prefer your implementation. Initially, I also thought about how to make it more generic, but considering that my project is about to go live, I submitted a rudimentary version to the community. I will test your PR next, and I hope the community can merge this #548 feature before the official version is released
I think with #548 merged, we can close this
What can we help you with?
When reading and writing in real time, whenever the action of uploading s3 is triggered, it will lead to a very large disk io consumption, cpu instantly rises, resulting in the impact of the production of the business operation, I tried to adjust remote.log.manager.thread.pool.size=cpu*2 but the effect is still not very ideal
Where would you expect to find this information?