apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.51k stars 1.29k forks source link

Support API for checking if segments need to be reloaded for a table #12117

Closed dang-stripe closed 1 month ago

dang-stripe commented 11 months ago

We're interested in having an API to know whether or not segments need to be reloaded for a table. The use case is to build robust automation around segment reloads after table/schema changes without having to depend directly on the table/schema change triggering the reload.

Currently, reload segments fans out ZK messages to all servers hosting segments for the table and the servers process them 1 at a time w/ the default config. While there's an API to check the column/index information for a segment, it's expensive to do a scatter-gather for all segments and compare that data against the current table/schema config.

One idea is to have servers track the znode version of both the table and schema configs that a segment was last reloaded with in memory and expose an API to the controller to fetch this version and compare it with the latest znode version to return to the caller whether there are segments that need to be reloaded.

cc @Jackie-Jiang @jadami10

Jackie-Jiang commented 4 months ago

A more desired solution would be:

Pros:

We can also consider combining the 2 approaches: when ZNode version doesn't match, we perform the check. When server detect that there is no need to reload the index, it updates the ZNode version within the segment without reloading it.