flatironinstitute / cufinufft

Nonuniform fast Fourier transforms of types 1 and 2, in 1D, 2D, and 3D, on the GPU
Other
83 stars 18 forks source link

Update ways to evaluate kernel value when using kerevalmeth=0 (i.e. exp(sqrt())) in interpolation kernels #130

Closed MelodyShih closed 2 years ago

MelodyShih commented 2 years ago

The interpolation kernels when kernevalmeth = 0 was evaluating the kernel w^d times rather than w*d times as the spreading kernels do. This pull request fixes it. p.s. we changed the default value of kernevalmeth at some point and these changes should have been done then -- sorry about that.

Here are some performance results before (in the parenthesis) and after the change: GPU 2D 3D
V100 1.9 ms (1.8 ms) 4.0 ms (5.5 ms)
RTX8000 3.8 ms (6.1 ms) 24.1 ms (81.0 ms)

Commands for the tests -- bin/interp2d_test 1 0 1024 1024 1048576 1e-3 bin/interp3d_test 1 0 128 128 128 2097152 1e-3

This seems to have a smaller impact on V100. But on RTX8000, we see about 1.6x and 3.4x for the 2D and 3D test case, respectively.

ahbarnett commented 2 years ago

Here are timings on an A100:

before: 2d: 1.4ms 3d: 4.8ms

after: 2d: 1.4ms 3d: 2.5ms

So, 2d no effect, 3d twice as fast. Pretty important. Cheers, Alex

ahbarnett commented 2 years ago

I merged, so you could do the above on another PR or just push to master. Also it would be good to get into the habit of listing improvements in the CHANGELOG...

MelodyShih commented 2 years ago

Sounds like a good idea to merge the kernel evaluation calls into one. I will experiment with it. And yes, I will remind myself to update the CHANGELOG in the future. Thank you for updating it with the recent changes.