Closed g1nsj0h4n closed 3 years ago
The Kmeans demo does show the DVZipped() usage. So that part is clear.
Looks like that CUFFT comes in binary form (.dll/.so). Therefore a separate wrapping layer (for Python) is needed. If such a wrapper already exists, maybe I can take a look to see how to make it compatible with ThrustRTC at data storage level.
Pyculib and Pycuda does have bindings for cuFFT.
Since the input vector and output vectors are same size, I guess it can be applied as a Transform function. It will be nice if you can show/point us an abstracted template for adding such functionalities, so that I can try to contribute some more Linear Algebra related functions.
Pyculib uses Numba device-array as its input/output buffer. On the other hand, ThrustRTC can also map a Numba device-array to a DVVectorLike object. Therefore, I see there would be no problem to use Pyculib and ThrustRTC side by side using Numba as their common underlying storage. The only issue I have, however, is that I cannot get Pyculib to work at all..
Traceback (most recent call last):
File "test.py", line 3, in
Another way is that I write my own cufft wrapper (as another library) using ThrustRTC directly for data storage. That would likely work more smoothly. But I haven't got time to do that for now.
It will be nice if you can show/point us an abstracted template for adding such functionalities, so that I can try to contribute some more Linear Algebra related functions.
That can only be done in C++. Basically, you can create another shared library linking to PyThrustRTC.so or PyThrustRTC.dll, so that you can use the C++ API classes like "DVVector" defined in DVVector.h. The new library should also export some new functions which can be called from Python through cffi.
For bridging FFT and other Linear Algebra libraries, I don't think that would be neccessary. You can choose Numba for data storage just like Pyculib does so you don't need to mind how ThrustRTC works.
Given up with Pyculib and had a try of CuPy (also have a cufft binding). It seems that I can get everything to work now. The only change I found I need to do at ThrustRTC side is to add better support of complex numbers so you don't have to use DVTuple and DVZipped anymore. You can just install the latest ThrustRTC (0.3.13) with pip and "conda install cupy", then you will be able to run the following code (https://github.com/fynv/ThrustRTC/blob/master/python/test/test_fft.py): ` import numpy as np import cupy as cp import ThrustRTC as trtc
cparr = cp.empty(4, dtype = np.complex64) darr = trtc.DVCupyVector(cparr) c = complex(1.0, 2.0) trtc.Fill(darr, trtc.DVComplex64(c)) print("input: ", cp.asnumpy(cparr)) cparr = cp.fft.fft(cparr) print("output: ", cp.asnumpy(cparr)) ` While the fft part has nothing to do with ThrustRTC, you can create a DVCupyVector any time you need, and use it in ThrustRTC.
This works like a breeze... Very happy to see Complex support added to ThrustRTC!
How can I access the Complex Arithmetic functions like multiply, etc. within Functors ?
Apologies about Pyculib. I guess the library has been abandoned for a while and superseded by CuPy, hence a very appropriate choice ! Thanks again...
How can I access the Complex Arithmetic functions like multiply, etc. within Functors ? I've integrated the functions in cuComplex.h. You should be able to use them in your device code.
Very useful library !
I am already doing some computation with ThrustRTC and would like to know if we can access CUDA FFT functionality from ThrustRTC itself - with minimal extra code ?
An example CPP method using Thrust is shown here
Also related question, what is the best practice for representing Complex Numbers in DV ? I currently separate them as two Float vectors and use. I do see DVTuple, but can we do a DVTupleVector ?