Closed MelodyShih closed 2 years ago
Hi @janden , thank you for reviewing the codes and the helpful suggestions. I incorporated them accordingly in the latest commit. I remove the full CPU version of the codes -- agree that keeping one version of the code (CPU/GPU hybrid) is cleaner. Also, for cases that CPU/GPU hybrid version are slower (small nf), the fseries computation is not the bottleneck of the nufft.
If there are other places that requires changes, please let me know, thanks.
Great! Sorry I dropped the ball on this. Will merge now.
Thanks for the review!
The pull request adds a GPU implementation of the 2nd half of function
onedim_fseries_kernel()
and its relative test code and scripts (see fseries_kernel_test.cu and fseriesperf.sh).The timing results (tol=1e-6) on a V100 GPU and a Intel Xeon Platinum 8268 CPU shows that it gives a speedup ranges from 0.8x to 27.3x:
According to this timing, I add a heuristic in
src/cufinufft.cu
to switch between the CPU version and the GPU version basing onnf1
,nf2
andnf3
.ps. the pull request also includes minor updates in the print statement of interpolation kernels.