fynv / ThrustRTC

CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.
Other
59 stars 6 forks source link

Possibility of doing FFT and representing Complex Numbers with ThrustRTC #16

Closed g1nsj0h4n closed 3 years ago

g1nsj0h4n commented 3 years ago

Very useful library !

I am already doing some computation with ThrustRTC and would like to know if we can access CUDA FFT functionality from ThrustRTC itself - with minimal extra code ?

An example CPP method using Thrust is shown here

Also related question, what is the best practice for representing Complex Numbers in DV ? I currently separate them as two Float vectors and use. I do see DVTuple, but can we do a DVTupleVector ?

g1nsj0h4n commented 3 years ago

The Kmeans demo does show the DVZipped() usage. So that part is clear.

fynv commented 3 years ago

Looks like that CUFFT comes in binary form (.dll/.so). Therefore a separate wrapping layer (for Python) is needed. If such a wrapper already exists, maybe I can take a look to see how to make it compatible with ThrustRTC at data storage level.

g1nsj0h4n commented 3 years ago

Pyculib and Pycuda does have bindings for cuFFT.

Since the input vector and output vectors are same size, I guess it can be applied as a Transform function. It will be nice if you can show/point us an abstracted template for adding such functionalities, so that I can try to contribute some more Linear Algebra related functions.

fynv commented 3 years ago

Pyculib uses Numba device-array as its input/output buffer. On the other hand, ThrustRTC can also map a Numba device-array to a DVVectorLike object. Therefore, I see there would be no problem to use Pyculib and ThrustRTC side by side using Numba as their common underlying storage. The only issue I have, however, is that I cannot get Pyculib to work at all..

Traceback (most recent call last): File "test.py", line 3, in from pyculib.fft import fft File "D:\Miniconda3\envs\py36\lib\site-packages\pyculib__init.py", line 49, in from . import blas, sparse, fft, rand, sorting File "D:\Miniconda3\envs\py36\lib\site-packages\pyculib\sorting\init__.py", line 1, in from .radixsort import RadixSort File "D:\Miniconda3\envs\py36\lib\site-packages\pyculib\sorting\radixsort.py", line 38, in lib = load_lib('radixsort') File "D:\Miniconda3\envs\py36\lib\site-packages\pyculib\sorting\common.py", line 24, in load_lib libpath = os.path.join(findlib.get_lib_dir(), fullname) AttributeError: module 'numba.findlib' has no attribute 'get_lib_dir'

fynv commented 3 years ago

Another way is that I write my own cufft wrapper (as another library) using ThrustRTC directly for data storage. That would likely work more smoothly. But I haven't got time to do that for now.

fynv commented 3 years ago

It will be nice if you can show/point us an abstracted template for adding such functionalities, so that I can try to contribute some more Linear Algebra related functions.

That can only be done in C++. Basically, you can create another shared library linking to PyThrustRTC.so or PyThrustRTC.dll, so that you can use the C++ API classes like "DVVector" defined in DVVector.h. The new library should also export some new functions which can be called from Python through cffi.

For bridging FFT and other Linear Algebra libraries, I don't think that would be neccessary. You can choose Numba for data storage just like Pyculib does so you don't need to mind how ThrustRTC works.

fynv commented 3 years ago

Given up with Pyculib and had a try of CuPy (also have a cufft binding). It seems that I can get everything to work now. The only change I found I need to do at ThrustRTC side is to add better support of complex numbers so you don't have to use DVTuple and DVZipped anymore. You can just install the latest ThrustRTC (0.3.13) with pip and "conda install cupy", then you will be able to run the following code (https://github.com/fynv/ThrustRTC/blob/master/python/test/test_fft.py): ` import numpy as np import cupy as cp import ThrustRTC as trtc

cparr = cp.empty(4, dtype = np.complex64) darr = trtc.DVCupyVector(cparr) c = complex(1.0, 2.0) trtc.Fill(darr, trtc.DVComplex64(c)) print("input: ", cp.asnumpy(cparr)) cparr = cp.fft.fft(cparr) print("output: ", cp.asnumpy(cparr)) ` While the fft part has nothing to do with ThrustRTC, you can create a DVCupyVector any time you need, and use it in ThrustRTC.

g1nsj0h4n commented 3 years ago

This works like a breeze... Very happy to see Complex support added to ThrustRTC!

How can I access the Complex Arithmetic functions like multiply, etc. within Functors ?

Apologies about Pyculib. I guess the library has been abandoned for a while and superseded by CuPy, hence a very appropriate choice ! Thanks again...

fynv commented 3 years ago

How can I access the Complex Arithmetic functions like multiply, etc. within Functors ? I've integrated the functions in cuComplex.h. You should be able to use them in your device code.