dft_2d.cl speedup - Githubissues

bkloppenborg / liboi

OpenCL Interferometry Library

https://github.com/bkloppenborg/liboi/wiki

GNU Lesser General Public License v3.0

5 stars 6 forks source link

dft_2d.cl speedup #22

Closed bkloppenborg closed 11 years ago

bkloppenborg commented 12 years ago

Right now the local execution size is hard coded to 128 units. On newer GPUs, this limit can be increased. The function

CRoutine_DFT::FT

should determine group sizes automatically by querying the OpenCL context for it's capabilities. The kernel. The kernel itself will need to be modified to ensure it doesn't read/write from/to invalid memory locations.

The DFT kernel occupies 77% of the GPU's time, so this is should be regarded as a high priority item.

bkloppenborg commented 11 years ago

Ignoring the algorithmic issues with this kernel (we should be using an non-uniform fast Fourier transform), the principle way to speed up the kernel is by increasing concurrency. In dae3eb82c34e37ea9f196d3d6f92496cb6e12892 we traded off using a local register for shared memory and increased the occupancy from 75% to 100%. This lead to a 33% improvement in performance on my ATI HD 7630m card.

bkloppenborg commented 11 years ago

100% occupancy achieved for ATI card in 1d59bfdc530f89f8e8cb3174f09f911c3e84ded4. This is as good as it gets.