anthonix / ffts

The Fastest Fourier Transform in the South
http://anthonix.com/ffts
Other
536 stars 213 forks source link

Segfault on x86 (misalignment in generated code) #21

Open sbergen opened 10 years ago

sbergen commented 10 years ago

I ran into a segfault when using data in a c++ std::array as the input and output of a 1d real transform. Here's a test case:

#include "ffts/ffts.h"

#include <array>
#include <complex>

int main()
{
    constexpr int size = 128;

    std::array<float, size> in;
    std::array<std::complex<float>, size / 2 + 1> out;

    in.fill(0.0);

    auto plan = ffts_init_1d_real(size, NEGATIVE_SIGN);
    ffts_execute(plan, in.data(), out.data());
    ffts_free(plan);
}

After noticing that it only happens with clang, and does not happen with a regular array (e.g. float in[size]), I had a chat with Chandler Carruth on the llvm IRC channel, the conclusion being:

12:41 <+chandlerc> if the segfault is occurring on a 'movaps' instruction, then 
                   its a common difference between clang generated code and gcc 
                   generated code on x86: gcc generates code which is *much* 
                   more tolerant of misalignment than clang does. if this 
                   library is misaligning the stack for example when calling 
                   back into C++ code, it can very easily trigger this
12:42 <+chandlerc> generated functions are fine in gdb, just 'disass' to look at 
                   the assembly
12:42 < SaBer> chandlerc: seems you are right: movaps 0x0(%rsi,%rax,4),%xmm7
12:42 <+chandlerc> hah
12:42 <+chandlerc> sorry,
12:42 <+chandlerc> =/
12:43 <+chandlerc> we've seen this in the JVM, Python, Ruby, and every other 
                   code generator so far
12:43 <+chandlerc> its a bug in ffts -- it needs to ensure the stack and 
                   variables are properly aligned according to the ABI when 
                   caling back into C/C++ code
dennisss commented 10 years ago

This is not an x86/SSE exclusive issue. I've also seen this with an ARM NEON processor calling the function through Java code. The problem is that the "in" and "out" arrays passed to ffts_execute are expected to be memory aligned because both the SSE and NEON extensions use 128bit wide registers. Your code can be made to work by declaring "in" as "std::array<float, size> in attribute((aligned(16)));". The same thing should be done for "out". I added a check in d70b38d for logging this as a problem.