NVIDIA / VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
Apache License 2.0
1.31k stars 233 forks source link

RuntimeError: CUDA error: invalid device ordinal #415

Open JunkangLiu opened 1 year ago

JunkangLiu commented 1 year ago

Describe the bug Calling the PytorchNvCodec raises following exception RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "SamplePyTorch.py", line 201, in <module> main(gpu_id, encFilePath, decFilePath) File "SamplePyTorch.py", line 164, in main src_tensor = surface_to_tensor(rgb_pln) File "SamplePyTorch.py", line 87, in surface_to_tensor img_tensor = pnvc.DptrToTensor(surf_plane.GpuMem(), RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "SampleTorchResnet.py", line 1158, in <module> run_inference_on_video(gpu_id, input_video) File "SampleTorchResnet.py", line 1122, in run_inference_on_video img_tensor = pnvc.makefromDevicePtrUint8(surf_plane.GpuMem(), RuntimeError: CUDA error: invalid device ordinal

To Reproduce This function raises the exception

surface_tensor = pnvc.makefromDevicePtrUint8(surfPlane.GpuMem(), surfPlane.Width(), surfPlane.Height(),
                                                         surfPlane.Pitch(), surfPlane.ElemSize())

Expected behavior Work without exception.

I don't have libtorch installed Whether to call PytorchNvCodec must install libtorch?

theHamsta commented 1 year ago

Can you debug which device id is used in makefromDevicePtrUint8 by printing the return value of get_device_id https://github.com/theHamsta/VideoProcessingFramework/blob/9f97990be15ecfc4ccbbe7dab09807c86c30615d/C:\dev\VideoProcessingFramework\src\PytorchNvCodec\src\PytorchNvCodec.cpp#L50 ? Probably a return value of get_device_id is not handled correctly

Whether to call PytorchNvCodec must install libtorch?

No, this shouldn't be necessary. It will use the binaries from the torch pip package.

JunkangLiu commented 1 year ago
torch::Tensor makefromDevicePtrUint8(CUdeviceptr ptr, uint32_t width,
                                     uint32_t height, uint32_t pitch,
                                     uint32_t elem_size, size_t str = 0U)
{

  auto cudaid = get_device_id((void*)ptr);
    std::stringstream ss1;
    ss1 << cudaid;
    throw std::runtime_error(ss1.str())
(py38) super@super-Precision-3650-Tower:~/installs/VideoProcessingFramework/install/bin$ python SampleTensorRTResnet.py 0 test.mp4`

`[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 1955 MiB, GPU 1240 MiB
Traceback (most recent call last):
  File "SampleTensorRTResnet.py", line 1294, in <module>
    infer_on_video(gpu_id, input_video, trt_file)
  File "SampleTensorRTResnet.py", line 1261, in infer_on_video
    img_tensor = pnvc.makefromDevicePtrUint8(surf_plane.GpuMem(),
RuntimeError: 2

I specified gpu id=0, but it seems that the get device id method gets gpu id=2

then i try

static int get_device_id(const void* dptr)
{
  cudaPointerAttributes attr;
  memset(&attr, 0, sizeof(attr));

  auto res = cudaPointerGetAttributes(&attr, dptr);
  if (cudaSuccess != res) {
    std::stringstream ss;
    ss << __FUNCTION__;
    ss << ": failed to get pointer attributes. CUDA error code: ";
    ss << res;

    throw std::runtime_error(ss.str());
  }

  return 0;
}

it works

Image type: redbone Image type: redbone Image type: redbone Image type: redbone Image type: redbone Can not decode frame [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 5391, GPU 2692 (MiB) Maybe get device id is getting incorrect value in some cases?