Open lewloiwc opened 1 month ago
Thanks for the comprehensive error report!
I tried building the original code that you shared, and it worked fine on my end. I was able to run the app 10 times without crashing. I was building with the MSVC compiler, so my guess is that the crash is happening because of some discrepancy between MSVC and the MINGW gcc compiler that you're using. The fact that writing the finalize_8x8
method inline seems to fix the crash is interesting... I don't have a theory for why that would be at the moment.
Out of curiosity, could you try calling fft_new_setup
with use_avx_if_available
set to false
? Just trying to see if the same crash happens with the SSE implementation.
Adding methods for the unordered FFT and convolution is a good idea as well. I'm planning to spend some time this week cleaning up and publishing my test code, so I probably won't get around to adding those methods until next week, but if you feel comfortable with your implementations, feel free to make a pull request! If you'd like to discuss those methods further, please open a separate issue, so we can keep this one focused on the crash.
I tried various things today as well, but so far these are the findings:
use_avx_if_available
to false
prevented crashes under all conditions.chowdsp::fft::FFT_COMPLEX
does not crash.MINGW gcc
, use_avx_if_available = true
, and chowdsp::fft::FFT_REAL
.real_finalize_8x8
method inline seems to fix the crash.Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ff6c53fa0a5 in chowdsp::fft::avx::pffft_real_finalize (e=0x1a662b84b00, out=0xa1e8bff1e0, in=0x1a662b81880, Ncvec=1656230616) at chowdsp_fft_impl_avx.cpp:1194
1194 pffft_real_finalize_8x8 (zero, zero, in + 1, e, out);
Thanks for the additional information!
After a bit more research, it seems that there may be a fundamental compatibility issue with MINGW. Apparently MINGW doesn't support 32-byte stack alignment, which is necessary for the AVX FFT implementation to work reliably. I was reading about it in a series of Stack Overflow issues, starting here, but I haven't been able to find very much recent information on this issue.
I guess I might need to add a compiler flag to block the AVX code from compiling on MINGW... Of course, I imagine you were hoping to use the AVX implementation for a performance boost, so that's probably not the answer you were hoping for :(.
If anyone has additional information on getting AVX intrinsics reliably working on MINGW, I'd be very interested to hear it!
I understand. So there was a compatibility issue with MINGW.
While this is likely a very uncertain and unstable workaround, as mentioned in the Stack Overflow answer, I've managed to get it working by adding __attribute__((always_inline))
to force inlining of the pffft_real_finalize_8x8
method, which is currently causing the problem. I think I'll personally use this method until a proper solution is found!
Hello.
I happened to find your article when I searched for "PFFFT avx", and since I was using single-precision PFFFT, I was trying to switch to chowdsp_fft. However, as a result, it started crashing about once every two times, and after a bit of investigation, it seems to be happening probably when it's chowdsp::fft::FFT_REAL.
I was testing with code like this that displays the frequency response on Windows 10:
I build chowdsp_fft by entering the following in PowerShell:
I compile the PFFFT version by entering the following in PowerShell:
I compile the chowdsp_fft version by entering the following in PowerShell:
If it starts up normally, a screen like this will be displayed:
The PFFT version works fine no matter how many times I launch the application, but the chowdsp_fft version crashes about half the time immediately after startup. Could you reproduce this bug on your end? I'm not really knowledgeable about programming, so I might be building the library incorrectly. Windows 10 | gcc (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 12.2.0
Also, this is a feature request, but I'd like functions that don't perform internal zreorder for faster processing, and functions that perform convolution. Currently, I'm adding an argument like
int ordered = 1
tofft_transform
myself, or using simple convolution without SIMD, but I think it would be convenient if these were included in the library from the start.Lastly, thank you for releasing such a wonderful library!!
P.S.
I’ve created a simpler version of the code for testing:
I compile the PFFFT version by entering the following in PowerShell:
I compile the chowdsp_fft version by entering the following in PowerShell:
When executed successfully, it outputs like this:
The PFFFT version always runs successfully, but the chowdsp_fft version crashes about half of the time, outputting only
---- start ----
. This problem doesn't occur when using chowdsp::fft::FFT_COMPLEX.I tried to fix it myself and found a strange solution, but I really don't understand why this prevents the crash. I removed all
pffft_real_finalize_8x8
and wrote it directly like this:I think SIMD complex and unordered convolution might look something like this (I might be wrong):