NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

Add in support for months_between #11737

Closed revans2 closed 1 day ago

revans2 commented 2 days ago

This fixes #11709

The code is a little complicated, mostly because the Spark code is doing some kind of complex things.

I think that there are some more optimizations that we could do to reduce memory and improve performance, but I wanted to get something working out the door sooner, and then we can look at improving it later.

revans2 commented 2 days ago

build

revans2 commented 1 day ago

build

revans2 commented 1 day ago

build

revans2 commented 1 day ago

I ran some local benchmarks to see the performance improvement

spark.time(spark.range(100000000000L, 120000000000L, 1, 64).selectExpr("AVG(months_between(timestamp_micros(id), timestamp_micros(10))) as mbt").show())

An a6000 GPU can complete this with 16 CPU cores in about 16 seconds (after it warms up)

Threadripper PRO 5975WX 32-Cores finishes in about 325 seconds when run with all 32 cores (no hyperthreading). That is about a 20x speedup.