Portenta's M7 performance hobbled by compiler flag

Optimization	CoreMark	Code Size
-Os	1127	138640
-O2	1484	139680
-O3	1542	142912

This is super interesting.

The 03 feature is faster but the memory is larger. See this data with an edgeimpulse.com machine learning program. Note: Program using 03 would not compile on the even core split, but was 20 ms faster to classify vision objects: from 121 ms to 101 ms. That is an tremendous speed improvement!


using 0s flag using the 1.0 M7 and 1.0 M5 core split

Sketch uses 776368 bytes (98%) of program storage space. Maximum is 786432 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.

run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 121 ms., Anomaly: 0 ms.): 
[0.94531, 0.05078, 0.00391, 0.00000]

using O3 flag   using 1.5 M7 and  0.5 M4 core split

Sketch uses 806184 bytes (55%) of program storage space. Maximum is 1441792 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.

run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 101 ms., Anomaly: 0 ms.): 
[0.99609, 0.00000, 0.00000, 0.00000]

arduino / ArduinoCore-mbed

Portenta's M7 performance hobbled by compiler flag #111