apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.51k stars 1.29k forks source link

Update deepstore segments with schema/tableConfig changes #9360

Open vvivekiyer opened 2 years ago

vvivekiyer commented 2 years ago

Currently, we support a number of preprocessing operations for a segment in response to schema/tableConfig changes. Some of them are:

  1. Add a new column. Remove/Modify an autogenerated column.
  2. Add a new index, remove an index.

Every time the server downloads and reloads a segment, the server preprocesses the segment and applies these changes. However, the segment directory in the deep store is never modified to reflect these schema changes. As we keep piling more segment preprocessing logic in reload path, time taken to reload a segment could increase if the user has a number of schema/tableConfig changes applied.

The suggestion here is to also update the segment in deep store to reflect these changes. This can be done with a background minion task.

Jackie-Jiang commented 2 years ago

We already have an API to ask server to upload the segment to deep store. We may leverage the same mechanism to refresh the segments in the deep store. Currently it is used to fix the realtime segments that do not have the deep store copy. See PinotLLCRealtimeSegmentManager.uploadToDeepStoreIfMissing() for more details