DSP arm_fir_decimate_q15 too slow in library

WMXZ-EU commented 7 years ago

Using the arm_fir_decimate_q15 routine in libarm_cortexM4lf_math.a is about 2 times slower than the same routine compiled by user. other decimate routines (q31, f32) are very similar ( library routines are slightly faster than application compiled, most likely due to different compiler flags)

ghost commented 7 years ago

Could you please share the compiler flags used to compile arm_fir_decimate_q15 routine.

WMXZ-EU commented 7 years ago

arm-none-eabi-gcc -mcpu=cortex-m4 -march=armv7e-m -mthumb -mlittle-endian -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O3 -ffunction-sections -fdata-sections -g -DMK66FX1M0 -DARM_MATH_CM4 -D__FPU_PRESENT -DUSB_SERIAL -DLAYOUT_US_ENGLISH -DTEENSYDUINO -DARDUINO=10600 -DF_CPU=240000000 -I../src -I"C:\Users\Walter\CMSIS\Core\Include" -I"C:\Users\Walter\CMSIS\DSP\Include" -I../uSD/src -I"C:\Users\Walter\Documents\GitHub\cores\teensy3" -I"C:\Users\Walter\Documents\Arduino\libraries\wmxzCore\src" -std=gnu11 -Wa,-adhlns="src/testFir.o.lst" -MMD -MP -MF"src/testFir.d" -MT"src/testFir.o" -c -o "src/testFir.o" "../src/testFir.c"

I compile it for a MK66FX1M0 (Teensy3.6 by PJRC.com) BTW Library was downloaded 25thOctober I copied the _q15,_q31, _f32 versions into testFir.c and got for 240 MHz clock and 256 points of data with a 129 point FIR with decimation of 8 the following times in microseconds type library compiled q15 129 66 q31 132 137
f32 101 109

What are the compiler flags for generating the libraries?

ghost commented 7 years ago

The compiler flags used to build the GCC M4lf DSP library are: -mcpu=cortex-m4 -mthumb -gdwarf-2 -MD -Wall -O3 -fno-strict-aliasing -ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=hard -ffp-contract=off -DARMCM4_FP -DARM_MATH_CM4 -DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -DUNALIGNED_SUPPORT_DISABLE -D__FPU_PRESENT="1U"

You can also check the uVision project to build the GCC DSP libraries. The project is part of CMSIS: C:\Keil\ARM\PACK\ARM\CMSIS\CMSIS\DSP_Lib\Source\GCC\arm_cortexM_math.uvprojx. You can also change the compiler settings to your needs and rebuild the used library.

WMXZ-EU commented 7 years ago

adding -gdwarf-2 -fno-strict-aliasing -ffp-contract=off to the compiler flags, got execution times of my compilation equal to library except decimate_q15 where execution time is half the execution time of the library function. (consequently, there is no need for library. THANKS for helping. So I close Issue)

ARM-software / CMSIS_5

DSP arm_fir_decimate_q15 too slow in library #97