ddemidov / vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
http://vexcl.readthedocs.org
MIT License
702 stars 82 forks source link

Batched FFTs #36

Closed d-meiser closed 11 years ago

d-meiser commented 11 years ago

Hi,

In many applications one has to calculate FFTs in batches. A simple example would be if there are many data sets that have to be processed or if only certain dimensions in a multi-dimensional array are to be transformed. Especially for small FFT sizes it can be beneficial to execute the batched FFTs in one call.

As far as I can tell this capability is not present in vexcl/fft at the moment or at least it is not exposed in the public API. Any recommendations as to how to add this capability? I could contribute.

Cheers, Dominic

ddemidov commented 11 years ago

Hello Dominic,

The FFT functionality was contributed by @neapel, so I'll address this question to him.

The interface supports multidimensional transforms, so I think adding batch transforms should be relatively easy. But I don't have much experience with FFT, so I am not sure if this is true.

Cheers, Denis.

ddemidov commented 11 years ago

It looks like multidimensional FFT does row-wise 1D FFTs with consequent transpose. I guess batch FFT should do the same thing with transposition omitted. Should be easy to add the functionality.

neapel commented 11 years ago

Currently you can only specify one transformation direction to be used for all dimensions, I'll add a way to use a list of directions instead, so that FFT({128, 32}, {forward, none}) calculates 32 transformations of length 128, for each row, while FFT({128, 32}, forward) is the same as FFT({128, 32}, {forward, forward}) i.e. a 2D forward FFT.

d-meiser commented 11 years ago

That's fantastic. Let me know if I can help. Somewhat related: Can FFTs currently be applied to vex::multivector objects? That might be another mechanism for achieving batching.

neapel commented 11 years ago

I've added the none kind in 34f565b0edd32514392fced3ce053aa7a69b9b28, how data in the batch is ordered is determined by the position of the none, i.e. in FFT({b,n,n}, {none, forward, forward}) uses n * n * batch_index + elem_index, with FFT({n,n,b},{forward,forward,none}) data is interleaved, i.e. batch_index + b * elem_index. With interleaved data, processing one batch_index per thread should be optimal, but it's currently implemented the other way around, and using interleaved data will transpose the array twice. I'll fix this next month.

The FFT doesn't support multivector at the moment, I was planning to support it as a way to split real/imag components instead of interleaving them using cl_double2, but not for batching, I don't think that's very useful since the number of components is determined at compile-time.

I plan to implement a boost::multi_array-like type though, which should be useful for batching because it's easy to create views of sub-arrays, i.e. with ndvector<3> batch({b, n, n}); fft(batch); using batch[0] would return a ndvector<2> containing the result of the first 2D-FFT, batch[1] contains the next etc, no need to manage the offset manually.

d-meiser commented 11 years ago

Looks great. I'll give it a try. Makes sense about the vexcl::multi_vector.

ddemidov commented 11 years ago

Hi Pascal,

That does look great! I've cherry-picked 0f3074638b and 34f565b0ed (I don't want to pick summed area tables algorithm yet, as its currently nvidia-specific).

ddemidov commented 11 years ago

re: multi_array

I've tried to come up with an interface for the multi_array type in #31. What do you think of it? I don't currently have time to actually implement this, but contributions are welcome :).

ddemidov commented 11 years ago

@neapel, is vex::slicer functionality enough for your fft needs?

neapel commented 11 years ago

Looks good, I'll try to look at it (and multivectors) for the FFT next week.