Open waterball opened 6 years ago
I've written a speed testing sample in Android Studio for R2C operations. At the same time, I've debuged into the simd_neon.h files so that I'm pretty sure that fftw uses neon to optimize fft. Sample code is here, Code snippets as:
const int size = 32;
float *in = (float *)fftwf_malloc(size * size * sizeof(float));
fftwf_complex *out = (fftwf_complex *)fftwf_malloc(size * ((size / 2 + 1) * 2) * sizeof(float));
fftwf_plan p = fftwf_plan_dft_r2c_2d(size, size, in, out, FFTW_ESTIMATE);
timeval begin, end;
double elapse;
gettimeofday(&begin, 0);
for (int i = 0; i < 1000; ++i)
fftwf_execute(p);
gettimeofday(&end, 0);
elapse = 1000.0 * (end.tv_sec - begin.tv_sec) + (end.tv_usec - begin.tv_usec) / 1000.0;
elapse = elapse / 1000.0;
char elapse_s[100];
sprintf(elapse_s, "Elapse: %f ms\n", elapse);
fftwf_destroy_plan(p);
fftwf_free(in);
fftwf_free(out);
I tested R2C 2d operation on Meizu mx4, and came to a wierd result as follows:
size | 32x32 | 64x64 | 80x80 | 128x128 |
---|---|---|---|---|
with neon | 0.053 ms | 0.38 ms | 1.19 ms | 2.8 ms |
without neon | 0.041 ms | 0.46 ms | 0.62 ms | 2.66 ms |
We see that in most cases fftw without neon is faster than the other. If the way I use FFTW is wrong, correct me. Thanks!!!
I also find the neon version slower...
I got there because I noticed the same behaviour.
Actually I'm a bit confused about the -mfpu option. In configure.ac, the following appears
case "${host_cpu}" in
aarch64)
;;
*)
if test "$have_neon" = "yes" -a "x$NEON_CFLAGS" = x; then
AX_CHECK_COMPILER_FLAGS(-mfpu=neon, [NEON_CFLAGS="-mfpu=neon"],
[AC_MSG_ERROR([Need a version of gcc with -mfpu=neon])])
fi
;;
esac
But the aarch64 reference states that the -mfpu flag is ignored ("-mfpu=list is rejected when targeting AArch64.", see https://developer.arm.com/documentation/100067/0608/armclang-Command-line-Options/-mfpu?lang=en)
The documention goes on to state that the -mcpu option is the relevant one for aarch64.
I compile fftw for android. it turns out the one with neon acceleration is somehow slower than the one without neon.
compile commands as follows: NDK_DIR="/home/meishe01/cx/kit/android-ndk-r12b" INSTALL_DIR="
pwd
/build-android/fftw3"export PATH="$NDK_DIR/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/:$PATH" export SYS_ROOT="$NDK_DIR/platforms/android-16/arch-arm/" export CC="arm-linux-androideabi-gcc --sysroot=$SYS_ROOT -march=armv7-a -mfloat-abi=softfp" export LD="arm-linux-androideabi-ld" export AR="arm-linux-androideabi-ar" export RANLIB="arm-linux-androideabi-ranlib" export STRIP="arm-linux-androideabi-strip" export CFLAGS="-mfpu=neon -mfloat-abi=softfp"
mkdir -p $INSTALL_DIR ./configure --with-slow-timer --host=arm-linux-gnueabi --prefix=$INSTALL_DIR LIBS="-lc -lgcc" --enable-neon --enable-float
./configure --with-slow-timer --host=arm-linux-gnueabi --prefix=$INSTALL_DIR LIBS="-lc -lgcc" --enable-float
make -j4 make install
I build two versions, one with neon and the other without neon, the only difference is the configure command.
I tried both version on the same phone, meizu mx4 pro and PLK-AL10, and counted the time spent only on fftwf_execute operations(R2C and C2R).
Any suggestions?