apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

feat: Support faster multi-column grouping ( `GroupColumn`) for `Date/Time/Timestamp` types #13457

Closed jonathanc-n closed 2 days ago

jonathanc-n commented 4 days ago

Which issue does this PR close?

Closes #13263.

Rationale for this change

What changes are included in this PR?

Add group column for Date/Time/Timestamp

Are these changes tested?

Are there any user-facing changes?

jonathanc-n commented 4 days ago
Comparing main and column
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃     column ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.27ms │     0.36ms │  1.34x slower │
│ QQuery 1     │    61.02ms │    36.84ms │ +1.66x faster │
│ QQuery 2     │    84.63ms │    83.85ms │     no change │
│ QQuery 3     │    78.49ms │    70.84ms │ +1.11x faster │
│ QQuery 4     │   696.13ms │   695.54ms │     no change │
│ QQuery 5     │   765.24ms │   806.28ms │  1.05x slower │
│ QQuery 6     │    33.10ms │    31.55ms │     no change │
│ QQuery 7     │    40.46ms │    41.95ms │     no change │
│ QQuery 8     │   797.77ms │   720.54ms │ +1.11x faster │
│ QQuery 9     │  1057.00ms │   993.54ms │ +1.06x faster │
│ QQuery 10    │   248.37ms │   246.95ms │     no change │
│ QQuery 11    │   261.37ms │   282.06ms │  1.08x slower │
│ QQuery 12    │   869.76ms │   926.57ms │  1.07x slower │
│ QQuery 13    │  1338.20ms │  1382.54ms │     no change │
│ QQuery 14    │   819.75ms │   802.72ms │     no change │
│ QQuery 15    │   856.39ms │   873.74ms │     no change │
│ QQuery 16    │  1892.26ms │  1881.67ms │     no change │
│ QQuery 17    │  1798.52ms │  1794.75ms │     no change │
│ QQuery 18    │  4686.36ms │  4779.32ms │     no change │
│ QQuery 19    │    61.05ms │    66.55ms │  1.09x slower │
│ QQuery 20    │  1132.85ms │  1268.51ms │  1.12x slower │
│ QQuery 21    │  1512.94ms │  1720.92ms │  1.14x slower │
│ QQuery 22    │  4829.86ms │  4858.37ms │     no change │
│ QQuery 23    │ 11642.12ms │ 10491.15ms │ +1.11x faster │
│ QQuery 24    │   633.42ms │   570.52ms │ +1.11x faster │
│ QQuery 25    │   511.58ms │   508.50ms │     no change │
│ QQuery 26    │   586.58ms │   689.01ms │  1.17x slower │
│ QQuery 27    │  1842.41ms │  1985.72ms │  1.08x slower │
│ QQuery 28    │ 14969.19ms │ 14861.69ms │     no change │
│ QQuery 29    │   540.19ms │   525.39ms │     no change │
│ QQuery 30    │   838.59ms │   828.83ms │     no change │
│ QQuery 31    │   787.30ms │   842.87ms │  1.07x slower │
│ QQuery 32    │  4365.25ms │  3927.03ms │ +1.11x faster │
│ QQuery 33    │  9277.87ms │  6284.31ms │ +1.48x faster │
│ QQuery 34    │  7823.31ms │  6524.80ms │ +1.20x faster │
│ QQuery 35    │  1121.27ms │  1112.43ms │     no change │
│ QQuery 36    │   130.37ms │   125.53ms │     no change │
│ QQuery 37    │    90.24ms │    90.44ms │     no change │
│ QQuery 38    │    91.87ms │    92.78ms │     no change │
│ QQuery 39    │   235.79ms │   228.69ms │     no change │
│ QQuery 40    │    32.08ms │    29.45ms │ +1.09x faster │
│ QQuery 41    │    29.82ms │    29.51ms │     no change │
│ QQuery 42    │    36.08ms │    36.22ms │     no change │
└──────────────┴────────────┴────────────┴───────────────┘

Giving weird results, trying to fix. Is someone else able to run this on their computer and see what they get.

jonathanc-n commented 4 days ago

@alamb Was this the sort of implementation you were looking for? (will add tests)

alamb commented 3 days ago

Contributor

Yes, exactly this -- thank you.

Giving weird results, trying to fix. Is someone else able to run this on their computer and see what they get.

I will run it on a GCP VM and report back. Given most of those queries don't group on date/time/timestamp columns this seems somewhat suspicious

alamb commented 2 days ago
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃     column ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.33ms │     2.36ms │     no change │
│ QQuery 1     │    41.29ms │    41.87ms │     no change │
│ QQuery 2     │   101.05ms │   100.68ms │     no change │
│ QQuery 3     │   105.79ms │   106.58ms │     no change │
│ QQuery 4     │   930.89ms │   915.16ms │     no change │
│ QQuery 5     │   943.56ms │   950.84ms │     no change │
│ QQuery 6     │    35.53ms │    35.85ms │     no change │
│ QQuery 7     │    45.42ms │    44.83ms │     no change │
│ QQuery 8     │  1001.08ms │  1006.61ms │     no change │
│ QQuery 9     │  1386.68ms │  1369.77ms │     no change │
│ QQuery 10    │   287.61ms │   282.50ms │     no change │
│ QQuery 11    │   329.86ms │   331.02ms │     no change │
│ QQuery 12    │   994.13ms │   989.78ms │     no change │
│ QQuery 13    │  1445.21ms │  1418.62ms │     no change │
│ QQuery 14    │   899.19ms │   901.23ms │     no change │
│ QQuery 15    │  1105.93ms │  1098.53ms │     no change │
│ QQuery 16    │  2003.77ms │  1987.47ms │     no change │
│ QQuery 17    │  1832.90ms │  1863.44ms │     no change │
│ QQuery 18    │  4254.57ms │  4253.13ms │     no change │
│ QQuery 19    │    99.26ms │    95.39ms │     no change │
│ QQuery 20    │  1279.67ms │  1270.28ms │     no change │
│ QQuery 21    │  1542.73ms │  1490.85ms │     no change │
│ QQuery 22    │  2670.07ms │  2637.39ms │     no change │
│ QQuery 23    │  8762.07ms │  8606.28ms │     no change │
│ QQuery 24    │   521.95ms │   508.59ms │     no change │
│ QQuery 25    │   424.60ms │   432.69ms │     no change │
│ QQuery 26    │   579.98ms │   581.24ms │     no change │
│ QQuery 27    │  1869.73ms │  1848.85ms │     no change │
│ QQuery 28    │ 13569.94ms │ 12923.62ms │     no change │
│ QQuery 29    │   532.15ms │   535.49ms │     no change │
│ QQuery 30    │   909.17ms │   903.12ms │     no change │
│ QQuery 31    │   956.03ms │   959.13ms │     no change │
│ QQuery 32    │  3984.74ms │  4039.97ms │     no change │
│ QQuery 33    │  4061.77ms │  4028.34ms │     no change │
│ QQuery 34    │  4073.75ms │  4070.16ms │     no change │
│ QQuery 35    │  1345.23ms │  1352.40ms │     no change │
│ QQuery 36    │   238.30ms │   237.21ms │     no change │
│ QQuery 37    │    98.32ms │    96.66ms │     no change │
│ QQuery 38    │   141.17ms │   142.49ms │     no change │
│ QQuery 39    │   452.11ms │   455.52ms │     no change │
│ QQuery 40    │    57.03ms │    52.79ms │ +1.08x faster │
│ QQuery 41    │    47.64ms │    47.12ms │     no change │
│ QQuery 42    │    64.67ms │    64.53ms │     no change │
└──────────────┴────────────┴────────────┴───────────────┘

And

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃     column ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.69ms │     0.70ms │     no change │
│ QQuery 1     │    72.89ms │    73.30ms │     no change │
│ QQuery 2     │   128.46ms │   128.16ms │     no change │
│ QQuery 3     │   134.06ms │   137.61ms │     no change │
│ QQuery 4     │   999.53ms │   977.83ms │     no change │
│ QQuery 5     │  1086.20ms │  1064.93ms │     no change │
│ QQuery 6     │    67.60ms │    69.36ms │     no change │
│ QQuery 7     │    84.77ms │    83.90ms │     no change │
│ QQuery 8     │  1072.19ms │  1049.16ms │     no change │
│ QQuery 9     │  1399.65ms │  1390.85ms │     no change │
│ QQuery 10    │   315.84ms │   313.48ms │     no change │
│ QQuery 11    │   352.33ms │   348.52ms │     no change │
│ QQuery 12    │  1097.29ms │  1096.46ms │     no change │
│ QQuery 13    │  1517.86ms │  1510.62ms │     no change │
│ QQuery 14    │  1026.11ms │  1022.12ms │     no change │
│ QQuery 15    │  1156.09ms │  1139.70ms │     no change │
│ QQuery 16    │  2123.21ms │  2103.40ms │     no change │
│ QQuery 17    │  1977.59ms │  1958.37ms │     no change │
│ QQuery 18    │  4369.93ms │  4383.23ms │     no change │
│ QQuery 19    │   119.92ms │   127.75ms │  1.07x slower │
│ QQuery 20    │  1376.32ms │  1373.50ms │     no change │
│ QQuery 21    │  1751.10ms │  1733.87ms │     no change │
│ QQuery 22    │  4260.62ms │  4224.08ms │     no change │
│ QQuery 23    │ 10119.47ms │  9924.73ms │     no change │
│ QQuery 24    │   672.95ms │   663.89ms │     no change │
│ QQuery 25    │   589.04ms │   584.38ms │     no change │
│ QQuery 26    │   756.64ms │   773.56ms │     no change │
│ QQuery 27    │  2099.50ms │  2068.92ms │     no change │
│ QQuery 28    │ 14476.01ms │ 13301.41ms │ +1.09x faster │
│ QQuery 29    │   582.22ms │   577.63ms │     no change │
│ QQuery 30    │  1063.45ms │  1058.02ms │     no change │
│ QQuery 31    │  1125.54ms │  1107.39ms │     no change │
│ QQuery 32    │  4061.78ms │  4012.77ms │     no change │
│ QQuery 33    │  4272.80ms │  4245.70ms │     no change │
│ QQuery 34    │  4322.31ms │  4212.04ms │     no change │
│ QQuery 35    │  1389.64ms │  1372.55ms │     no change │
│ QQuery 36    │   292.61ms │   272.95ms │ +1.07x faster │
│ QQuery 37    │   173.46ms │   184.09ms │  1.06x slower │
│ QQuery 38    │   189.39ms │   194.60ms │     no change │
│ QQuery 39    │   510.54ms │   503.15ms │     no change │
│ QQuery 40    │    93.20ms │    92.77ms │     no change │
│ QQuery 41    │    80.00ms │    78.67ms │     no change │
│ QQuery 42    │    96.38ms │    79.81ms │ +1.21x faster │
└──────────────┴────────────┴────────────┴───────────────┘

My conclusion is that the variability for the very fast queries (10s - 100s of ms) is significant.

alamb commented 2 days ago

Thanks again @jonathanc-n and @jayzhan211

alamb commented 1 day ago

I think the last remaining type is Decimal -- I will file one more ticket

alamb commented 1 day ago

Filed https://github.com/apache/datafusion/issues/13505 for the last of it