apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.79k stars 3.47k forks source link

[Bug] How to use `kDLCUDAManaged` #15874

Open dusty-nv opened 1 year ago

dusty-nv commented 1 year ago

Hello, I am using an NVIDIA Jetson device which has unified CPU/GPU memory, and I'm trying to eliminate unneeded CPU<->GPU memory copies. I noticed there are kDLCUDAManaged and kDLCUDAHost device types. But when I try to construct a device with kDLCUDAManaged:

import tvm

device = tvm.runtime.Device(tvm.runtime.Device.kDLCUDAManaged, 0)  # tvm.runtime.cuda(0)
print(f"device={device}, name={device.device_name}, compute={device.compute_version}, max_clocks={device.max_clock_rate}, multiprocessors={device.multi_processor_count}, max_thread_dims={device.max_thread_dimensions}, api_version={device.api_version}, driver_version={device.driver_version}")
tvm.error.InternalError: Traceback (most recent call last):
  [bt] (5) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(TVMFuncCall+0x64) [0xffff5d979a3c]
  [bt] (4) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(+0x32efb14) [0xffff5d97ab14]
  [bt] (3) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(int, bool)+0x1ec) [0xffff5d97d114]
  [bt] (2) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::DeviceAPIManager::GetAPI(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool)+0x30c) [0xffff5d97cd44]
  [bt] (1) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x78) [0xffff5b69c3a0]
  [bt] (0) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff5d9c6d50]
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/c_runtime_api.cc", line 133
InternalError: Check failed: (allow_missing) is false: Device API cuda_managed is not enabled.

What is the proper way to use kDLCUDAManaged ?

Environment

TVM Unity 0.12 Ubuntu 20.04, JetPack 5.1.2, CUDA 11.4

tqchen commented 1 year ago

Looks like indeed we didn't have proper kDLCUDAManaged . I think it can be added in similar ways as https://github.com/apache/tvm/blob/main/src/runtime/cuda/cuda_device_api.cc#L256

And add allocation support. I don't have experience with CUDAManaged before, but if it is a matter of updating cudamalloc and copy we might be able to update support