NVIDIA-Merlin / core

Core Utilities for NVIDIA Merlin
Apache License 2.0
19 stars 14 forks source link

[BUG] Dask-CUDA does not work with Merlin/NVTabular #363

Open rjzamora opened 11 months ago

rjzamora commented 11 months ago

As pointed out by @oliverholworthy in https://github.com/NVIDIA-Merlin/core/pull/274#discussion_r1160187901, cuda_isavailable() is used in merlin.core.compat to check for cuda support. Unfortunately, this is a known problem for dask-cuda.

This most likely means that Merlin/NVTabular has not worked properly with Dask-CUDA for more than six months now. For example, the following code will produce an OOM error for 32GB V100s:

import time
from merlin.core.utils import Distributed
if __name__ == "__main__":
    with Distributed(rmm_pool_size="24GiB"):
        time.sleep(30)

You will also see an error if you don't import any merlin/nvt code, but use the offending cuda.is_available() command:

import time
from numba import cuda # This is fine
cuda.is_available() # This is NOT
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
    with LocalCUDACluster(rmm_pool_size="24GiB") as cluster:
        time.sleep(30)

Meanwhile, the code works fine if you don't sue the offending command or import code that also imports merlin.core.compat:

import time
from dask_cuda import LocalCUDACluster
if __name__ == "__main__":
    with LocalCUDACluster(rmm_pool_size="24GiB") as cluster:
        time.sleep(30)
rjzamora commented 11 months ago

cc @jperez999 @karlhigley

rjzamora commented 10 months ago

@jperez999 - Do you think this line is actually necessary? If we already have HAS_GPU from the line above, maybe we can just do:

if not HAS_GPU:
    cuda = None
pentschev commented 10 months ago

Besides the above I was looking at the code in more detail and I see the following block: https://github.com/NVIDIA-Merlin/core/blob/6e52b48140615708b59926b5f9c3601f8feeab93/merlin/core/compat/__init__.py#L102-L105

This is creating a new context on a GPU only to query memory size, and CUDA context should never be addressed before Dask initializes the cluster. Also note in the pynvml_mem_size there's an equivalent code block: https://github.com/NVIDIA-Merlin/core/blob/6e52b48140615708b59926b5f9c3601f8feeab93/merlin/core/compat/__init__.py#L57-L60

The PyNVML code will NOT create CUDA context and is safe to run before Dask. Is there a reason why you're using the code block with Numba to query GPU memory instead of always using PyNVML for that?