apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

Segment Purger cannot purge old segments after schema evolution #10868

Closed sajjad-moradi closed 1 year ago

sajjad-moradi commented 1 year ago

If a column is added to the schema and there's also an index defined for that column, the older segments which were built by older schema/table config cannot be purged with the following exception:

Cannot create index for column: XYZ because it is not in schema.

This is because SegmentColumnarIndexCreator assumes that schema and table config are always in-sync. That's a valid assumption - other than the old segment scenario mentioned above - because validation put in place in table config rest endpoint makes sure that if an index is added to the table config, that column should be present in the schema.

sajjad-moradi commented 1 year ago

This issue was masked for inverted index columns before index SPI refactoring https://github.com/apache/pinot/pull/10184, because the check for inverted index was bypassed by generate.inverted.index.before.push flag in the table config. But after https://github.com/apache/pinot/pull/10184 was merged, the check now applies for all indices, and now the issue is more common.

mcvsubbu commented 1 year ago

This is a production issue for us