Closed MelodyShih closed 2 years ago
Here are timings on an A100:
before: 2d: 1.4ms 3d: 4.8ms
after: 2d: 1.4ms 3d: 2.5ms
So, 2d no effect, 3d twice as fast. Pretty important. Cheers, Alex
I merged, so you could do the above on another PR or just push to master. Also it would be good to get into the habit of listing improvements in the CHANGELOG...
Sounds like a good idea to merge the kernel evaluation calls into one. I will experiment with it. And yes, I will remind myself to update the CHANGELOG in the future. Thank you for updating it with the recent changes.
The interpolation kernels when
kernevalmeth = 0
was evaluating the kernel w^d times rather than w*d times as the spreading kernels do. This pull request fixes it. p.s. we changed the default value ofkernevalmeth
at some point and these changes should have been done then -- sorry about that.Commands for the tests --
bin/interp2d_test 1 0 1024 1024 1048576 1e-3
bin/interp3d_test 1 0 128 128 128 2097152 1e-3
This seems to have a smaller impact on V100. But on RTX8000, we see about 1.6x and 3.4x for the 2D and 3D test case, respectively.