Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
95 stars 20 forks source link

refactor: fetch and catch parts of segment instead of chunks #403

Closed jeqo closed 1 year ago

jeqo commented 1 year ago

To decouple fetching rate from transformation, a new concept of Part is introduced. The goal is to download larger ranges (parts) and transform smaller chunks. Caching also gets the benefit of having less fragmented bits of data. By having fetch parts instead of chunks, chunks will be collocated on the same file (in the case of disk-based cache) and could be fetched together faster. With the current approach, multiple reads to separate files are required.

It also includes a renaming from ChunkManager/Cache into FetchManager/Cache. Beware the 3rd commit will be large but only includes renaming. Main changes are on the second commit. Apart from first commit, others will be squashed if approved.

jeqo commented 1 year ago

Thanks @AnatolyPopov ! Adding suggestions in the last fix-up commit, have a look!

ivanyu commented 1 year ago

Do I understand that we not only fetch by these bigger parts, but also cache by them?

jeqo commented 1 year ago

@ivanyu Correct, interaction with storage (and cache) is now based on fetch parts instead of individual chunks.

jeqo commented 1 year ago

Closing in favor of #394 #429