NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Other
6.06k stars 1.77k forks source link

Feedback/Question #145

Open rlewkowicz opened 2 years ago

rlewkowicz commented 2 years ago

I feel like you're all so brilliant, you literally don't understand what "simple" actually means (based on your "simple" examples). My complaint across really the entire C++ space is that you're missing comfortable mid level apis. I'm going to focus in general on dx11 and Cuda.

Lets talk high/low apis and where I feel there's a gap. Take for example dxgi, swap chains, and getting textures out of the GPU. Chuck Walbourn is the major contributor to DirectXTK. In this, he has some utilities for saving a dx11 textures as a wic. This is high level. Very opinionated and domain specific. Among other things he handwaves what turns out to be a not trivial thing for people not familiar with the space:

given an immediate device context and swapchain backbuffer.

It's crazy how much complexity is behind such a simple phrase. Now I got there, but this is the key highlight of my frustration. I don't need the WIC save, and I don't want to know the color format of the dxgi screen capture (unfortunately I had to learn it). I want someone to make the swap chain for me. I don't want to know about gpu/cpu access of frames and the involved nuance. Maybe that sounds selfish, but you're not going to re write an OS kernel every time you want to deploy an app? It's building blocks and foundations so the next person can reasonably consume it with just the basics. This is a glaring example of missing the mid level apis.

That brings me to my current issue. I have a ID3D11Texture2D and I'd like to turn that in to an cv::cuda::GpuMat. I think 460 is boogered up right now when it comes to gpu mats, but I need just a simple example of ID3D11Texture2D -> Cuda. I see this: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__D3D11.html

If I can get this, I'm confident I'll figure out how to get it into the mat after the fact instead of a direct convert. Then questions pop on on performance. If I'm not leaving the gpu, I don't need any cpu access? I don't need these massive examples. I need simple consumable examples (or actual mid level apis) with better docs that explain the ramification of choices I might make. Such as cpu vs gpu access flags on that dx11 frame etc.

starfire-lzd commented 7 months ago

If I can get this, I'm confident I'll figure out how to get it into the mat after the fact instead of a direct convert. Then questions pop on on performance. If I'm not leaving the gpu, I don't need any cpu access? I don't need these massive examples. I need simple consumable examples (or actual mid level apis) with better docs that explain the ramification of choices I might make. Such as cpu vs gpu access flags on that dx11 frame etc.

Hello, may I ask if I can share your final solution? I also encountered this problem.