Closed revans2 closed 1 day ago
build
build
build
I ran some local benchmarks to see the performance improvement
spark.time(spark.range(100000000000L, 120000000000L, 1, 64).selectExpr("AVG(months_between(timestamp_micros(id), timestamp_micros(10))) as mbt").show())
An a6000 GPU can complete this with 16 CPU cores in about 16 seconds (after it warms up)
Threadripper PRO 5975WX 32-Cores finishes in about 325 seconds when run with all 32 cores (no hyperthreading). That is about a 20x speedup.
This fixes #11709
The code is a little complicated, mostly because the Spark code is doing some kind of complex things.
I think that there are some more optimizations that we could do to reduce memory and improve performance, but I wanted to get something working out the door sooner, and then we can look at improving it later.