Closed fstein93 closed 3 months ago
I've looked into the issue and found at least one bug in the C and Fortran interface of the multi-transform functions. It's ultimately an internal pointer issue when converting from C / Fortran transform pointers to C++. This fix is in #59 and has been merged into the develop branch. There could be other issues, but hopefully it will work for you.
Some general notes for your use case: It looks like you are computing dense 2D transforms. Using SpFFT for this use case will come with some overhead, as it really hasn't been designed for it. You might be better off writing a thin wrapper around cuFFT for memory transfer to / from host memory. Also, FFTs are typically so fast, that copying to / from device memory can easily outweigh any benefit you might get from using GPUs. So if your input and output are not already located in device memory, I'd expect there to be little benefit unless you have very large input sizes.
Thank you. I will try it later even if it may not lead to an acceleration. Regarding our use case, I was attempting to accelerate local FFTs first. Several 1D and all 2D FFTs will become obsolete in favor of distributed 3D FFTs and sparse FFTs.
I have just tried it locally on CPU and the tests pass.
Great, Thanks for reporting the issue!
In case it helps, here are some details about the multi-transform implementation: It's designed to achieve better performance through overlap of asynchronous operations. This includes memory transfers to / from device memory and non-blocking MPI calls. Therefore, it heavily depends on the MPI implementation on your system, the specific problem size and the type of memory (host, pinned or device memory) . You may use it to utilize CPU and GPU at the same time, by using transforms with different processing units set (assuming input / output is in host memory). We've measured up to a 20% runtime reduction compared to processing each transform individually, but there are also cases, where it can cause a slight performance penalty (potentially due to cache effects).
I'll close this for now, but feel free to reopen it if there are still related problems.
Dear developers, I am currently attempting to use SpFFT (version 1.0.6) to offload FFTs to GPUs in a Fortran code (CP2K). In our code, there is the need to perform several FFTs at once, usually 1D and 2D FFTs which I map to 3D FFTs. I am able to employ the routines
spfft_transform_forward
andspfft_transform_backward
with something likeNow, I am migrating to the
spfft_multi_transform_*
routines. Currently, I am only trying the CPU version for testing. A boiled down version of my new code for the forward transform looks as followedLeakSanitizer:DEADLYSIGNAL ==358529==ERROR: LeakSanitizer: SEGV on unknown address 0x000000000060 (pc 0x0000037714d0 bp 0x7ffe17ffeba0 sp 0x7ffe17ffeb68 T0) ==358529==The signal is caused by a READ memory access. ==358529==Hint: address points to the zero page. LeakSanitizer:DEADLYSIGNAL ==358530==ERROR: LeakSanitizer: SEGV on unknown address 0x000000000060 (pc 0x0000037714d0 bp 0x7ffcd8239f90 sp 0x7ffcd8239f58 T0) ==358530==The signal is caused by a READ memory access. ==358530==Hint: address points to the zero page.
0 0x37714d0 in spfft::ExecutionHost::space_domain_data() /home/fstein/cp2k/cp2k/tools/toolchain/build/SpFFT-1.0.6/src/execution/execution_host.cpp:356
I am running with two MPI ranks and a single thread per rank.
The actual code is found at here for the transformation creation and here for the actual transformation. The preceding commit where I used a loop over
spfft_transform_forward
worked in the CPU version.I have also tried variants of the calls to the transformation routine in which I indexed the first element directly (
C_LOC(pointer(0))
instead ofC_LOC(pointer)
) or where I dropped theC_LOC
function where possible. I haven even added my own interface to the C routine where I used arrays instead of plainTYPE(C_PTR)
.Do you have an idea what is going on or do you need more information?