Open PaulStoffregen opened 3 years ago
This is super interesting.
The 03 feature is faster but the memory is larger. See this data with an edgeimpulse.com machine learning program. Note: Program using 03 would not compile on the even core split, but was 20 ms faster to classify vision objects: from 121 ms to 101 ms. That is an tremendous speed improvement!
using 0s flag using the 1.0 M7 and 1.0 M5 core split
Sketch uses 776368 bytes (98%) of program storage space. Maximum is 786432 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.
run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 121 ms., Anomaly: 0 ms.):
[0.94531, 0.05078, 0.00391, 0.00000]
using O3 flag using 1.5 M7 and 0.5 M4 core split
Sketch uses 806184 bytes (55%) of program storage space. Maximum is 1441792 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.
run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 101 ms., Anomaly: 0 ms.):
[0.99609, 0.00000, 0.00000, 0.00000]
I ran the CoreMark benchmark on Portenta. It's running significantly slower than M7 at 480 MHz should. Most of the poor performance is due to line 14 in variants/PORTENTA_H7_M7/cflags.txt.
Here's the results from 3 runs on Portenta, 2 of them by editing line 14 in cflags.txt.
On AVR and SAMD, optimizing for size (-Os) works well. But on M4 & M7 cores, it costs quite a lot of performance. If you want to give Portenta users a substantial speed boost, just edit line 14 to use better compiler optimizations.