apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.2k stars 1.21k forks source link

Built-in jobs to move segments of hybrid tables from Realtime Servers to Offline Servers #5753

Closed xiangfu0 closed 3 years ago

xiangfu0 commented 3 years ago

We should try to have some built-in scheduled jobs to move segments from realtime servers to offline servers. So that users don't need to setup external jobs to push segments from external data sources.

So that, we can keep limited capacity for realtime servers and keep adding new servers to offline cluster and rebalance segments.

Also this could enable further supports on offline servers to do segment merge and rollups.

cc: @kishoreg @snleee @Jackie-Jiang @mcvsubbu @npawar

Jackie-Jiang commented 3 years ago

There is an periodic task doing this: RealtimeSegmentRelocator It can periodically move segments based on the segment assignment config

mayankshriv commented 3 years ago

While periodically moving the segments from realtime to offline is a good idea, in many cases, it would also benefit to have the segment merge/rollup performed before moving to offline. This may require a config of its own on what kind of processing needs to be performed before moving the segments to offline. Would be good to start from a user story on the requirements, and then translate them into a design doc.

singalravi commented 3 years ago

use case: Most of the queries in our system are for recent time frame (last 7 days) but we want to retain data for much longer time period. For this, we want to deploy a tiered storage system where realtime servers have faster disk and more cpu and memory while offline servers have slower (and bigger) disks with less cpu and memory. This way, we can keep adding more offline servers as the data volume grow. Moving segments from realtime to offline table will provide us cost optimization.

mangrrua commented 3 years ago

That will be great! If merge/rollup can be applied(@mayankshriv 's suggestion), users can have a lot of flexibility. Because generally, realtime segments represents minimal aggregation. Improve query performance, retain data in long-term and save some other costs.

For that, pinot ui can have a scheduler service(jobs can be set for a specified times, and config can be set etc. Also with api of course), so users can configure offline jobs for realtime to offline segments. At the backend, job(maybe apache spark or classical mapreduce) can process realtime segments in parallel, and produce offline segments.

xiangfu0 commented 3 years ago

That will be great! If merge/rollup can be applied(@mayankshriv 's suggestion), users can have a lot of flexibility. Because generally, realtime segments represents minimal aggregation. Improve query performance, retain data in long-term and save some other costs.

For that, pinot ui can have a scheduler service(jobs can be set for a specified times, and config can be set etc. Also with api of course), so users can configure offline jobs for realtime to offline segments. At the backend, job(maybe apache spark or classical mapreduce) can process realtime segments in parallel, and produce offline segments.

Right, ideally we should have multiple built-in jobs to handle the basic data loading/re-organizing workload and use hadoop/spark for advance/parallelism workload

mangrrua commented 3 years ago

That will be great! If merge/rollup can be applied(@mayankshriv 's suggestion), users can have a lot of flexibility. Because generally, realtime segments represents minimal aggregation. Improve query performance, retain data in long-term and save some other costs. For that, pinot ui can have a scheduler service(jobs can be set for a specified times, and config can be set etc. Also with api of course), so users can configure offline jobs for realtime to offline segments. At the backend, job(maybe apache spark or classical mapreduce) can process realtime segments in parallel, and produce offline segments.

Right, ideally we should have multiple built-in jobs to handle the basic data loading/re-organizing workload and use hadoop/spark for advance/parallelism workload

exactly

mcvsubbu commented 3 years ago

use case: Most of the queries in our system are for recent time frame (last 7 days) but we want to retain data for much longer time period. For this, we want to deploy a tiered storage system where realtime servers have faster disk and more cpu and memory while offline servers have slower (and bigger) disks with less cpu and memory. This way, we can keep adding more offline servers as the data volume grow. Moving segments from realtime to offline table will provide us cost optimization.

You can do this today by setting tagOverrideConfig, and moving the completed segments to any tagged host. Of course, this will move all completed segments, not just the older ones.

kishoreg commented 3 years ago

@mcvsubbu I think the movement is only within realtime table, we are talking about moving them to the offline table.

Jackie-Jiang commented 3 years ago

@kishoreg I'm not sure if we should move realtime completed segments to offline servers, or to another set of realtime servers. For a realtime only table, I don't see the benefit of making it hybrid compared with just moving completed segments to another set of realtime servers, we also need to pay the extra cost for the extra filter for hybrid table. We should treat realtime table as first class citizen and support merge/rollup/backfill etc. the same way as offline table.

kishoreg commented 3 years ago

@Jackie-Jiang Interesting idea. I am all for removing the distinction between real-time and offline tables.

xiangfu0 commented 3 years ago

@kishoreg I'm not sure if we should move realtime completed segments to offline servers, or to another set of realtime servers. For a realtime only table, I don't see the benefit of making it hybrid compared with just moving completed segments to another set of realtime servers, we also need to pay the extra cost for the extra filter for hybrid table. We should treat realtime table as first class citizen and support merge/rollup/backfill etc. the same way as offline table.

I think the major challenging here is to support atomic swap for a batch of segments. Also things like batch backfilled segments are usually time bounded, but realtime segments are not.

Jackie-Jiang commented 3 years ago

I think the major challenging here is to support atomic swap for a batch of segments. Also things like batch backfilled segments are usually time bounded, but realtime segments are not.

These problems will be addressed within the segment merge/rollup project which @snleee is working on

npawar commented 3 years ago

Design doc for this: https://docs.google.com/document/d/1-e_9aHQB4HXS38ONtofdxNvMsGmAoYfSnc2LP88MbIc/edit#

mcvsubbu commented 3 years ago

I just realized that if we have multiple data centers, this technique will not produce the same results across the data centers. Something worth noting.

kishoreg commented 3 years ago

Why do you say that? As long as you give enough buffer time for the events from previous time period to flow in, it should be ok right?

mcvsubbu commented 3 years ago

I mis-worded it. The results will be the same, but the segments in each data center may not be the same, right? I am not sure if the m to n segment reduction and time boundary computation has the exact same predictable results all the time. In that case, maybe we are fine, but things may change during software upgrade, for example.

npawar commented 3 years ago

Basic feature (move from realtime to offline using minions) is complete. Let's open new issues for enhancements related to using hadoop map reduce/spark etc.

npawar commented 3 years ago

Documentation: https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows