apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.53k stars 1.3k forks source link

Using deep storage as a tiered storage #5554

Open pradeepgv42 opened 4 years ago

pradeepgv42 commented 4 years ago

Currently IIUC all the segments need to be on disk to be able to query them. Usually, queries to older data tends to be lower or clients would be okay with in-curing latency for queries on older data.

Using this assumption, we could possibly keep the segments older than X days on deep storage and not load them onto the servers.

Couple of ways to deal with this based on discussions on slack: 1) Using presto on top of pinto and make presto-pinot be able to query segments on S3 directly. 2) When the query arrives lazy load the segment and return the results.

siddharthteotia commented 4 years ago

This would be a good addition.

We can potentially build a storage hierarchy as DRAM -> SSD/HDD -> Remote store (Deep Store).

Azure Kusto implements this hierarchy.