Using deep storage as a tiered storage

apache / pinot

Apache Pinot - A realtime distributed OLAP datastore

Apache License 2.0

5.53k stars 1.3k forks source link

Currently IIUC all the segments need to be on disk to be able to query them. Usually, queries to older data tends to be lower or clients would be okay with in-curing latency for queries on older data.

Using this assumption, we could possibly keep the segments older than X days on deep storage and not load them onto the servers.

Couple of ways to deal with this based on discussions on slack: 1) Using presto on top of pinto and make presto-pinot be able to query segments on S3 directly. 2) When the query arrives lazy load the segment and return the results.

apache / pinot

Using deep storage as a tiered storage #5554