apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.48k stars 1.28k forks source link

Auto-tuning Pinot real-time segment size based on actual stream data consumption #12513

Open chenboat opened 8 months ago

chenboat commented 8 months ago

Currently Pinot's adaptive realtime segment sizing algorithm (as documented here makes the segment sizes converge to a target byte size based on the following assumption. It adjusts the rows of new segments based on the rows in the previous segments.

We assume that the ratio of segment size to number of rows is a constant for each table (say, R).

This assumption may not be valid for the spiky traffic uses (e.g., search log data ingestion because log data volume depends on services state and can be highly volatile). Our result using the adaptive sizing algorithm shows that segments varied a lot because the data size per row changes.

We propose to change to segment size prediction based on actual stream data consumed instead -- which is a more accurate measure than the row count. After one server replicas finishes conuming the target number of bytes, it can commit the segments and work with the rest of the replicas to either catch up to the offset reached (if they have not done so) or ask them to download and replace the finished segment.

Jackie-Jiang commented 8 months ago

Here is another related issue with this algorithm: #12509

Jackie-Jiang commented 8 months ago

The challenge here is how to estimate the segment size during consumption. Do you have a solution in mind?

chenboat commented 8 months ago

My proposal is to use number of bytes consumed in the previous segment instead of the number of rows consumed to determine the current segment consumption target.

In particular, (1) The number of bytes consumed is recorded as part of the zkMetadata per segments (2) When a new segment is created, there is a new mode/config to allow the segment to consume until the byte limit. (3) When one server replica reaches that limit, it will commit the segment. (4) The rest of the replicas will download and replace their current segments.