Closed Nodd closed 11 years ago
FFT for a large enough problem size (256x256 in your case) needs several kernel calls, and has to allocate some temporary arrays (that's why 1D works — it needs only one kernel, no temporary arrays required). Reikna tries to pack these arrays into the same location of physical memory, if their dependencies allow it. On PyOpenCL it means reassigning the array's pointer to the physical buffer dynamically — that's what array.data = buf
was for. In PyOpenCL 2013.1 this attribute became read-only, so I fixed this part in the current development version (0.3.0).
Now for the hard part. You have two choices:
1) Fix this in your local version. In reikna/cluda/tempalloc.py
change
array.data = buf
to
array.base_data = buf
or 2) Switch to the development version. May not be the best idea, as its API is not completely stable, and you'll have to rewrite some of your code.
As a side note, I have an (other) exception if data_in is not complex. Is it a bug or a feature ?
Rather, the lack of feature. FFT
itself only does complex-to-complex transformation; if you need only real numbers on either side, you have to attach transformations:
import numpy as np
from reikna import cluda, Transformation
from reikna.cluda import dtypes
from reikna.fft import FFT
from reikna.cluda.tempalloc import TrivialManager
tr = Transformation(
inputs=['in_re'], outputs=['out_c'],
derive_o_from_is=lambda in_re: dtypes.complex_for(in_re),
snippet="${out_c.store}(COMPLEX_CTR(${out_c.ctype})(${in_re.load}, 0));")
def main():
api = cluda.ocl_api()
thr = api.Thread.create(temp_alloc=dict(cls=TrivialManager))
N = 256
M = 10000
data_in = np.random.rand(N)
data_in = data_in.astype(np.float32)
cl_data_in = thr.to_device(data_in)
cl_data_out = thr.array(data_in.shape, np.complex64)
fft = FFT(thr)
fft.connect(tr, 'input', ['input_re'])
fft.prepare_for(cl_data_out, cl_data_in, -1, axes=(0,))
if __name__ == "__main__":
main()
Closing this one too, 0.3.0 is released.
While investigating for #8, I tried to compute the FFT on a 2D array, but I get:
My code is:
It passes if I use: data_in = np.random.rand(N) + 1j*np.random.rand(N)
As a side note, I have an (other) exception if data_in is not complex. Is it a bug or a feature ?