Open mayankshriv opened 3 years ago
If the schema was updated with the new columns, then the schema in the controller would have the new columns right? Perhaps you meant the other way around (i.e. "used the schema in controller as opposed to the schema in the segment") ?
Speaking of which, I think it will be super useful to retain the schema evolution in zookeeper (i.e. versioned schemas with some metadata on when an update was done). It can be used to make decisions such as those by segment purger. In this case, the purger could also have decided to backfill the columns with default values, for example.
No, SegmentPurger uses the table config from controller (to identify that a it needs to build inverted index for a column), but it uses the schema in the segment and does not find the newly added column (as neither segment reload nor backfill happened), and hence the error. Hope this answers your question.
We ran into an issue where SegmentPurger failed due to schema evolution as follows:
When the SegmentPurger tried to purge older segments, it failed with the following error:
java.lang.IllegalStateException: Cannot create inverted index for column: <xxx> because it is not in schema
This is likely because SegmentPurger used the schema in the segment as opposed to the schema in the controller. It would be desirable for SegmentPurger to gracefully handle this scenario.