apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

After segment purge, the segment start and end time in segment zk metadata may not reflect the correct values #10951

Open PrachiKhobragade opened 1 year ago

PrachiKhobragade commented 1 year ago

During the process of purging segments based on a record matcher, certain rows may be eliminated. Once the segment is rebuilt, new metadata is generated. However, in certain scenarios, the time metadata from the previous segment is carried over to the new segment. Consequently, even if rows from the beginning of the segments are removed, the segment start time metadata will still reflect the start time of the segment before purge. The same applies to the segment end time metadata.

Jackie-Jiang commented 1 year ago

Checking the code in SegmentPurger and find the following note:

      // The time column type info is not stored in the segment metadata.
      // Keep segment start/end time to properly handle time column type other than EPOCH (e.g.SIMPLE_FORMAT).
      if (segmentMetadata.getTimeInterval() != null) {
        config.setTimeColumnName(_tableConfig.getValidationConfig().getTimeColumnName());
        config.setStartTime(Long.toString(segmentMetadata.getStartTime()));
        config.setEndTime(Long.toString(segmentMetadata.getEndTime()));
        config.setSegmentTimeUnit(segmentMetadata.getTimeUnit());
      }

I think at the time when we added the purge task (in 2018), schema might not be available in the cluster, and we don't have the time type info (more context in #2846). Now with #10869 we always use the schema from ZK to generate the new segment, so we can safely remove these special handling to reflect the actual time range