Open damaBeugXam opened 2 years ago
(a) efficient use of SIMD & scalar requires planning with 'measure' or higher and not 'estimate' (although 'estimate' will likely favor SIMD over scalar); (b) you can use fftw_print_plan() to see the plan and check NEON is being used; the format is not very readable, but codelets appear as 'n1_16' and similar with SIMD as suffix, such as 'n1_16_neon' ; if NEON is not used, try with both 'estimate' and 'measure' as the library will ignore SIMD in 'measure' mode if it's not faster than scalar; (c) the A53 may not have any significant performance advantage running NEON code over regular scalar code; it was designed for mobile efficiency rather than FP performance (and it's nearly a decade old by now). I don't have numbers for the A53, but for instance the A7 took 4x as long to do 4x (Q-form) FP32 SIMD as to do scalar FP32, so NEON isn't very useful on the A7 except for the extra register space.
I am trying to compile FFTW3 to run on ARM Neon (More precisely, on a Cortex a-53). The build env is x86_64-pokysdk-lunix, The host env is aarch64-poky-lunix. I am using the aarch64-poky-linux-gcc compiler. I used the following command at first: The compiler did not support the
-mfloat-abi=softfp
and the-mfpu=neon
. It also did not let me define the path to the sysroot this way. Then used the following command: This command succeeded with this config log and this config.h. Then I used the commandmake
thenmake install
. I then copied my shared library file into my host env and usedfftwf_
instead offftw_
in my code base. The final step was to recompile the program. I ran a test and compared the times for both algorithm using<sys/resource.h>
. I also used thefftw[f]_forget_wisdom()
on both algorithms so that It can be fair. However, I am not getting a speedup. I believe that using an SIMD architecture (NEON
in our case) would accelerate the FFTW library. I would really appreciate if anyone can point out something that I am doing wrong so that I can try a fix and see if I can get the performance boost I am looking for.