Open jlowe opened 3 months ago
I have a silly question: is it safe to use the hostPartColumns before Cuda.DEFAULT_STREAM.sync()
is called ?
I have a silly question: is it safe to use the hostPartColumns before
Cuda.DEFAULT_STREAM.sync()
is called ?
Liangcai already offline explained my question. thx!
11280 adds asynchronous copying of shuffle data after partitioning and synchronizes on the stream before releasing the GPU semaphore. Instead we could release the semaphore after freeing the device data but before synchronizing on the stream, e.g. via a patch like this:
With a lot of data being copied back to the host during shuffle via pinned memory, this could releasing the GPU semaphore significantly earlier, which has the potential to improve query performance.
However this could also impact query performance if it leads to excessive spilling. The memory has been freed on the stream but is only available to be allocated once the stream has been synchronized past the point of the frees. This change wouldn't trigger an OOM error, but it might cause spilling since IIRC we exhaust spilling before executing a cudaDeviceSynchronize to try to free memory.