NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

Use the new host memory allocation API #11671

Closed revans2 closed 2 weeks ago

revans2 commented 3 weeks ago

This is step 2 in getting all host memory allocations to go through a single API. https://github.com/rapidsai/cudf/pull/17197 added in a new raw allocation API. This starts to use that new API. I will then put up another PR to CUDF that will switch HostMemoryBuffer.allocate to always call the default allocator's allocate API, which should make it so that effectively all large host memory allocations go through our default allocator HostAlloc.

After that I will start to clean up the code and remove places where we pass in HostAlloc as a separate allocation API to CUDF APIs.

revans2 commented 3 weeks ago

Please note that this will not compile until Oct 30th or so, when it's dependency finishes going through CI.

revans2 commented 3 weeks ago

I updated the code to also fix an issue that was showing up with row to columnar exec. It is a bug where the integration tests were failing because we are now injecting retry OOMs in places that we were not before. Because all of the processing goes through a single API now.

I think there may be some more issues that I need to debug. I am seeing an issue with the time zone DB too. But it is a bit harder to reproduce and the fix for it would likely be in spark-rapids-jni. So feel free to review this.

revans2 commented 2 weeks ago

build