Improve performance with ARM SIMD on Cortex-M4 and more powerfull cores

gbmhunter / MFixedPoint

MFixedPoint is a header-only fixed-point C++ library suitable for fast arithmetic operations on systems which don't have a FPU (e.g. embedded systems).. Suitable for performing computationally intensive operations on a computing platform that does not have a floating-point unit (like most smaller embedded systems, such as Cortex-M3, CortexM0, ATmega, PSoC 5, PSoC 5 LP, PSoC 4, Arduino platforms e.t.c). Common applications include BLDC motor control and image processing. Best performance on a 32-bit or higher architecture (although 8-bit architectures should still be fine).

MIT License

159 stars 32 forks source link

Improve performance with ARM SIMD on Cortex-M4 and more powerfull cores #82

Open qywx opened 7 years ago

qywx commented 7 years ago

Let's discuss how can we improve the performance of this library. I expect to use arm_math.h rather than ASM.

gbmhunter commented 7 years ago

Hi @sledgeh . Thanks for raising an issue! Can you please elaborate more on what you are after?

I had a look into what arm_math.h is, at it seems it is a bunch of signal processing functions for ARM Cortex M* devices?

https://github.com/ARM-software/CMSIS/blob/master/CMSIS/Include/arm_math.h

qywx commented 7 years ago

Yes it is. The aim (one of main) of arm_math.h is to wrap SIMD ability. Here is CMSIS documentation of the CMSIS library.
arm_add_q31.c shows ARM SIMD implementation. This is a simple function, but there are others more complicated. As far as I understood ARM Cortex M core has 4 SIMD conveyors. So we will achieve ~4x acceleration on array processing.

This is actual for ARM cores with SIMD ability CM3 and CM4 and may be CM7.

gbmhunter commented 7 years ago

@sledgeh , o.k., so it allows you to do parallel processing, as long as the instruction is the same.

How do you see this as being a benefit to this library? What is a real-world example of what you want to call in code, and what to expect in return?

qywx commented 7 years ago

Now I think it is enough to add C++ bindings for arm_math and pass Q classes to the math functions, which will convert ninja Q to arm Q. I will create pull request ones it'll be done. Or create another one header lib on top of yours and ARM's. Thank you for dialog.