Closed jeqo closed 1 year ago
Thanks @AnatolyPopov ! Adding suggestions in the last fix-up commit, have a look!
Do I understand that we not only fetch by these bigger parts, but also cache by them?
@ivanyu Correct, interaction with storage (and cache) is now based on fetch parts instead of individual chunks.
Closing in favor of #394 #429
To decouple fetching rate from transformation, a new concept of Part is introduced. The goal is to download larger ranges (parts) and transform smaller chunks. Caching also gets the benefit of having less fragmented bits of data. By having fetch parts instead of chunks, chunks will be collocated on the same file (in the case of disk-based cache) and could be fetched together faster. With the current approach, multiple reads to separate files are required.
It also includes a renaming from ChunkManager/Cache into FetchManager/Cache. Beware the 3rd commit will be large but only includes renaming. Main changes are on the second commit. Apart from first commit, others will be squashed if approved.