apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.52k stars 3.71k forks source link

Historical errors when using PeriodGranity in the groupBy query #17489

Open soullkk opened 3 days ago

soullkk commented 3 days ago

Historical errors when using PeriodGranity in the groupBy query

Affected Version

Apache Druid 28.0.1.

Description

Please include as much detailed information about the problem as possible.

- Any debugging that you have already done
If the interval of a segment is like this:2024-11-18T12:10:00.000  --- 2024-11-18T12:15:00.000
The result of  granularity.bucketEnd(maxTime) will be less than minTime when using the query mentioned earlier.

@Nullable public static VectorCursorGranularizer create( final StorageAdapter storageAdapter, final VectorCursor cursor, final Granularity granularity, final Interval queryInterval ) { final DateTime minTime = storageAdapter.getMinTime(); final DateTime maxTime = storageAdapter.getMaxTime();

final Interval storageAdapterInterval = new Interval(minTime, granularity.bucketEnd(maxTime));

PeriodGranularity.truncate(long t):

period = "PT5H"

t = 2024-11-18T12:15:00.000

timeZone = "Asia/Singapore"

origin = -27000000

final int hours = period.getHours();
if (hours > 0) {
  if (hours > 1 || hasOrigin) {
    // align on multiples from origin
    long h = chronology.hours().getDifferenceAsLong(t, origin);
    h -= h % hours;
    long tt = chronology.hours().add(origin, h);
    // always round down to the previous period (for timestamps prior to origin)
    if (t < tt && origin > 0) {
      t = chronology.hours().add(tt, -hours);
    } else if (t > tt && origin < 0) {
      t = chronology.minuteOfHour().roundFloor(tt);
      t = chronology.minuteOfHour().set(t, 0);
    } else {
      t = tt;
    }
    return t;
  } else {
    return chronology.hourOfDay().roundFloor(t);
  }
}

 The result of "long tt = chronology.hours().add(origin, h)" will be "2024-11-18T07:30:00.000Z"
 "t > tt && origin < 0 " is valid.
 so "t = chronology.minuteOfHour().set(t, 0)"  will be executed
 the return will be t = "2024-11-18T07:00:00.000Z"
 so granularity.bucketEnd(maxTime) =  "2024-11-18T12:00:00.000Z", it is less than 2024-11-18T12:10:00.000 (the start time of interval)
soullkk commented 2 days ago

This issue may be related to https://github.com/apache/druid/issues/4073