apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.35k stars 1.25k forks source link

Upsert Compaction: Schema / Index updates #13494

Closed tibrewalpratik17 closed 4 weeks ago

tibrewalpratik17 commented 1 month ago

I have not yet explored how schema or index updates affect the CRC mismatch issue in Upsert compaction. I'm adding this as a task in the parent issue so we can track it for some time. I will test updating the schema and indexes, and if everything works out fine, we can close this issue.

tibrewalpratik17 commented 1 month ago

@tarun11Mavani from Uber will help me validate this.

tarun11Mavani commented 1 month ago

Sure @tibrewalpratik17 . Assign this to me.

tarun11Mavani commented 4 weeks ago

As part of this analysis, I discovered that when we update the schema or the index on an upsert-realtime table and then use the reload segments API to reload all segments, it doesn't update the CRC of the existing segments. The CRC is stored in creation.meta and is not updated when server reload happens. I verified this by monitoring the creation.meta file and also compared the segment metadata from server and zookeeper before and after the segment reload. This behavior ensures that schema and index changes do not adversely affect the Upsert Compaction task, maintaining seamless compaction of older segments.

tibrewalpratik17 commented 4 weeks ago

Thanks @tarun11Mavani for confirming this! I see we use crc from creation.meta which is only updated during segment-creation time so it safes us from updates. Thanks to help verify this!

Closing this issue for now!