Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
95 stars 20 forks source link

S3 download speed is too slow when consumer needs to read data from S3 storage #297

Open HenryCaiHaiying opened 1 year ago

HenryCaiHaiying commented 1 year ago

Currently the S3 download speed is low (about 20Mbit/s), the reason is we are fetching chunks in sequential order, can we do that in parallel (like how S3 multi-part upload is done in parallel)?

More details: https://the-asf.slack.com/archives/C05A1NF5SFM/p1687829723272589?thread_ts=1687245261.417759&cid=C05A1NF5SFM

AnatolyPopov commented 1 year ago

Thanks for the issue. We will think about this however right now the broker is always requesting the data without specifying the end position and such way we can not really say how much data have to be downloaded or moreover how much data will be actually read by the broker. So my understanding is that parallel fetching will lead to unnecessary downloads especially when the segments are big. IIRC there was a proposal from @jeqo to make a change on a broker side to always request maxBytes from the plugin. That could help with this a little but I'm not sure when/if it will be changed. At the same time we had some discussions about a little bit different approach, basically doing some pre-fetching to the cache of some (perhaps configurable but limited) amount of data. At the same time we are trying to understand what will be the common read patterns in general (sequential read is obviously one of them) and trying to invent some related optimisations.

jeqo commented 1 year ago

Just for reference, this was mentioned on the upstream PR introducing remote fetching: https://github.com/apache/kafka/pull/13535#discussion_r1166318900 but was mentioned to be fixed in a following PR -- not sure if this one: https://issues.apache.org/jira/browse/KAFKA-14915

HenryCaiHaiying commented 1 year ago

I think the common use case from consumer side is doing bootstrap or catchup pulls, reading from this old offset and all the way to the tail of the queue until the consumer caught up. They would want all the data in remote storage for this topic partition. There might exist a few diagnostic use cases people only want a small segment of the partition data, but I don't think that's that common.

In either cases, I think you can do some prefetching to make sure the next buffer of the stream is ready to serve to the client, you don't have to pre-read too much. And if you want to be fancy, you can put in an adaptive prefetch algorithm. First time only prefetch 4K bytes, the the client comes back to ask more, prefetch 8K byte then 16K, ... until a cap (e.g. 10M) That probably can work well for both use cases.

AnatolyPopov commented 1 year ago

As far as I remember we were briefly discussing exactly this with @ivanyu at some point(meaning adaptive fetching). I would say as soon as we will have this one https://github.com/aiven/tiered-storage-for-apache-kafka/pull/245 merged it should be easy to implement at least prefetching of some fixed amount of data but I think we will also consider both. Thanks for bringing this up again and for additional info about use cases. :+1:

HenryCaiHaiying commented 1 year ago

@AnatolyPopov @ivanyu It seems https://github.com/aiven/tiered-storage-for-apache-kafka/pull/245 is already merged, are you guys going to come back to add some prefetching or caching support to speed up the S3 download?

ivanyu commented 1 year ago

Yeah, we're working on it

HenryCaiHaiying commented 1 year ago

Thanks, that's a very useful feature.

HenryCaiHaiying commented 1 year ago

@ivanyu Looping back to check the progress on this issue.

jeqo commented 1 year ago

@HenryCaiHaiying, #403 and #411 should help to improve fetching performance. Have a look!

HenryCaiHaiying commented 1 year ago

Thanks for the contribution, left a small comment in one of the PR

HenryCaiHaiying commented 12 months ago

@jeqo with this PR: https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/pull/429, do we still need #403 ?

ivanyu commented 12 months ago

@HenryCaiHaiying The former supersedes the latter.