NVIDIA / cuda-python

CUDA Python Low-level Bindings
https://nvidia.github.io/cuda-python/
Other
809 stars 63 forks source link

cuFFT support #66

Closed spec-benno closed 2 hours ago

spec-benno commented 3 weeks ago

Could you please add cuFFT functionality to the low-level cuda-python library? Or if it is already supported, could you please add an example on how to use it?

leofang commented 3 weeks ago

Hi @spec-benno Could you share a bit more your use case? It would help us coordinate with internal teams, as well as offering better advices to you.

FYI, nvmath-python (see GTC introduction here) is coming soon(-ish) which will offer cuFFT coverage, but as of today you can already access full cuFFT capability through CuPy.

spec-benno commented 3 weeks ago

Hi @leofang, thank you for your fast reply!

Yes no problem, I can do that. We (Spectrum Instrumentation) are a manufacturer of high-end Digitizer and AWG PCIe cards. We and our customers use NVIDIA GPUs with CUDA to acquire and generate high-speed data. Our current flagship digitizer is running at 10 GS/s and we use your GPU to process the data. Typically, we do everything with C/C++ and use CUDA with RDMA to have direct buffer transfer between our cards and the NVIDIA GPU. One example is to get the data with the digitizer and send it to the GPU, which then does an FFT and sends the data to the host PC.

Recently, we see a clear shift at our customers towards using Python. We do currently have CUDA examples for Python using CuPy and cuFFT, however we weren't able to setup RDMA with the CuPy package and found that this is easily done with your low-level cuda-python package. So we have this running and some simple kernels are working, however I haven't found a way to get cuFFT to work with the low-level package. To me it's not clear what is the easiest way to get cuFFT to work, could you help me with this?

See the current example without RDMA support and using CuPy: https://github.com/SpectrumInstrumentation/spcm/blob/master/src/examples/1_acquisition/5_acq_single-cudafft.py

spec-benno commented 3 weeks ago

ps. the video is great! The nvmath-python package with support for cuFFT seems to be exactly what I'm looking for. Just wondering will nvmath be build on top of cuda-python? Will it be possible to use kernels and buffers created with cuda-python in nvmath-python?

leofang commented 3 weeks ago

HI @spec-benno thanks a lot for sharing your use case. It allowed me to route your question to the right people. Someone from the Holoscan SDK team will respond to your RDMA + FFT question. They are the experts :) (cc: @awthomp)

nvmath-python should work with cuda-python just fine, if this is your concern.

awthomp commented 2 weeks ago

Hi @leofang, thank you for your fast reply!

Yes no problem, I can do that. We (Spectrum Instrumentation) are a manufacturer of high-end Digitizer and AWG PCIe cards. We and our customers use NVIDIA GPUs with CUDA to acquire and generate high-speed data. Our current flagship digitizer is running at 10 GS/s and we use your GPU to process the data. Typically, we do everything with C/C++ and use CUDA with RDMA to have direct buffer transfer between our cards and the NVIDIA GPU. One example is to get the data with the digitizer and send it to the GPU, which then does an FFT and sends the data to the host PC.

Recently, we see a clear shift at our customers towards using Python. We do currently have CUDA examples for Python using CuPy and cuFFT, however we weren't able to setup RDMA with the CuPy package and found that this is easily done with your low-level cuda-python package. So we have this running and some simple kernels are working, however I haven't found a way to get cuFFT to work with the low-level package. To me it's not clear what is the easiest way to get cuFFT to work, could you help me with this?

See the current example without RDMA support and using CuPy: https://github.com/SpectrumInstrumentation/spcm/blob/master/src/examples/1_acquisition/5_acq_single-cudafft.py

Hi @spec-benno -- great to meet you, and thanks for the ping, @leofang.

The problem you're trying to solve seems like an excellent fit for Holoscan, and really highlights why we built this sensor processing platform.

As a bit of background, I'm the creator of cuSignal which is now fully part of CuPy as of v13. The goal of cuSignal was to provide GPU speeds for common signal processing functions (spectrum estimation, convolutions, correlations, filtering, etc) all from Python. One of the blockers when going to production, however, was developers traditionally struggled to connect their compute workloads to a sensor... particularly if one needs RDMA from NIC to GPU via networking or over PCIe with GPUDirect.

To this end, for networking, we have developed the Basic Network Operator (< 10 Gbsp, Linux Sockets) and the Advanced Network Operator (line rate, DPDK, needs NIC + GPU) within Holoscan, and Python bindings for the latter are in the work. When combining GPUDirect with an FFT, your application would essentially consist of one of these I/O operators feeding to an operator that runs an FFT on incoming buffers.

I'd be interested to see how you setup RDMA in cuda-python, however. We don't currently have a native PCIe based GPUDirect operator, as they typically require driver and kernel level modifications, but this would be something interesting to investigate.

Anyway, I'd love to touch base and learn more about what you're doing and how we can help with NVIDIA tools and platforms. The goal of Holoscan is to unite real time sensors to GPU based computing for both Python and C++ developers alike.

spec-benno commented 2 weeks ago

Hi @awthomp, thank you for reaching out.

We have been in contact with NVIDIA about your Holoscan Plattform and our cards are working with the Jetson and Clara devices. The kernel drivers of our cards support RDMA and CUDA and we do have working examples for C/C++ with RDMA support in the form of a add-on option (SCAPP): https://spectrum-instrumentation.com/products/drivers_examples/scapp_cuda_interface.php. As a next step, we would like to get RDMA to work with Python.

Here is information about the Clara Holoscan devices and Spectrum Instrumentation cards on the NVIDIA website: https://developer.nvidia.com/blog/new-sensor-partners-expand-surgical-ultrasound-and-data-acquisition-capabilities-in-the-clara-holoscan-platform/?ncid=partn-372615-vt12#cid=gtcs22_partn_en-us.

If you'd like we could setup a meeting to discuss the details on how we use cuda-python?

@spec-ubbo

awthomp commented 2 weeks ago

Hi @spec-benno -- Great to hear you all have been in contact. Let's touch base offline, and we can scope out how you want to use cuda-python. I'll include @leofang or someone on his team in the meeting as well. Please send an e-mail to adamt {at} nvidia {dot com}.

As a heads up, many folks - myself included - are out of the office until Monday, June 24th. Cheers!

leofang commented 2 hours ago

Thanks for the exchange and glad to know someone from NVIDIA is already in touch. Happy to meet/discuss as needed, let us move the follow-up discussion offline.

Just to close the loop:

FYI, nvmath-python (see GTC introduction here) is coming soon(-ish) which will offer cuFFT coverage,

The project is now live on GitHub: https://github.com/nvidia/nvmath-python and is pip-installable: https://pypi.org/project/nvmath-python/. The plan is to make a public announcement next week at the SciPy conference.