Open PrachiKhobragade opened 1 year ago
Checking the code in SegmentPurger
and find the following note:
// The time column type info is not stored in the segment metadata.
// Keep segment start/end time to properly handle time column type other than EPOCH (e.g.SIMPLE_FORMAT).
if (segmentMetadata.getTimeInterval() != null) {
config.setTimeColumnName(_tableConfig.getValidationConfig().getTimeColumnName());
config.setStartTime(Long.toString(segmentMetadata.getStartTime()));
config.setEndTime(Long.toString(segmentMetadata.getEndTime()));
config.setSegmentTimeUnit(segmentMetadata.getTimeUnit());
}
I think at the time when we added the purge task (in 2018), schema might not be available in the cluster, and we don't have the time type info (more context in #2846). Now with #10869 we always use the schema from ZK to generate the new segment, so we can safely remove these special handling to reflect the actual time range
During the process of purging segments based on a record matcher, certain rows may be eliminated. Once the segment is rebuilt, new metadata is generated. However, in certain scenarios, the time metadata from the previous segment is carried over to the new segment. Consequently, even if rows from the beginning of the segments are removed, the segment start time metadata will still reflect the start time of the segment before purge. The same applies to the segment end time metadata.