apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.53k stars 1.3k forks source link

SDF conversion mismatch between segment completion and broker routing #6548

Open jtao15 opened 3 years ago

jtao15 commented 3 years ago

Once a segment is completed, either converted from consuming to online in real-time or uploaded in offline, the start/end time in zk metadata is updated based on the raw data. For SDF, start/end time is stored in millis, and the conversion is done by DateTimeFormatter (default time zone: local).

DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyyMMdd");
formatter.parseMillis("20170701");  -> 1498892400000

For broker time boundary management, the time millis are converted back to SDF to compute the time boundary. For broker time segment pruner, the query time condition is parsed to time millis, and compared with segment metadata. These conversions are done by TimeFormatSpec (default time zone: UTC).

DateTimeFormatSpec dateTimeFormatSpec = new DateTimeFormatSpec("1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd");
dateTimeFormatSpec.fromFormatToMillis("20170701")  ->  1498867200000

If Pinot clusters are in time zones other than UTC, the time boundary will be shifted accordingly, and the time pruner will filter wrong segments instead of expected.

We should unify the conversion to avoid this. One approach is to standardize this to always use UTC timezone, but it won’t be backward compatible. Another approach is to keep both default time zones in local, this will require all Pinot instances to be in the same local time zone.

jtao15 commented 3 years ago

@snleee @Jackie-Jiang

Jackie-Jiang commented 3 years ago

We should definitely use the same time zone for both conversions. Better to use the same class to perform the conversion so that the conversions are always consistent. Since the time segment pruner is newly added, and also stateless, we should keep it the same as the conversion used in the metadata.