Division operation much slower with microlib than other libs on cortex-M0

LMESTM commented 4 years ago

Description of defect

Following investigations in #12116 , it turns out that division with microlib seems much slower than ARM lib (and possibly also other libs) on cores like cortex-M0.

Few measurement on a "divided by 1000" operation were measured and are reported here for background information:

Board	Toolchain	Division	Operation time (incl. IOs toggling)	Frequency during wake-up
L476RG	uARM	Native ( / )	440ns	48MHz
L476RG	uARM	Div1000	480ns	48MHz
L476RG	ARMC6	Native ( / )	440ns	48MHz
L476RG	ARMC6	Div1000	480ns	48MHz
L073RZ	uARM	Native ( / )	86µs	4MHz
L073RZ	uARM	Div1000	226µs	4MHz
L073RZ	uARM	Div10 (3 times)	18µs	4MHz
L073RZ	ARMC6	Native ( / )	11,6µs	4MHz
L073RZ	ARMC6	Div1000	13µs	4MHz
L073RZ	ARMC6	Div10 (3 times)	18µs	4MHz

Target(s) affected by this defect ?

ALL

Toolchain(s) (name and version) displaying this defect ?

microlib toolchain (µARM)

What version of Mbed-os are you using (tag or sha) ?

mbed-os master of Jan 2020.

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

Not relevant here.

How is this defect reproduced ?

See detailed history in #12116

0xc0170 commented 4 years ago

What would be suggestion here? From the link in the #12116, this loos like it should be known issue with microlib.

It might be worth reaching out internal team who is in charge for microlib ?

kjbracey commented 4 years ago

For division, microlib is doing what's being advertised, which is being smaller and slower. And I'm inclined to think the trade-off is okay. It's 40 bytes instead of 340 bytes. 8 times slower (in this case), 8 times smaller. 8 times slower isn't enough to be that much of a problem, except when running at like 4MHz.

I'm more concerned about the 64-bit multiply (which the div1000 was using there). That is tens of times slower, AND many times bigger. It would be better in all regards to just use the standard library routine. We should raise this with the tools team.

That only seems to be a problem for M0/M0+/M23 - the microlib library routine for M3 is also catastrophically slow, but the compiler seems to not use it - it just inlines optimal code instead.

In the interim, we could paste in the standard library routines with #if MICROLIB somewhere - that should override the currently-poor microlib definitions.

MarceloSalazar commented 4 years ago

Note we're waiting from Arm to comment and confirm how to proceed.

MarceloSalazar commented 4 years ago

Note this affects all Cortex-M0/M0+. @kjbracey-arm can you please review?

MarceloSalazar commented 4 years ago

@kjbracey-arm any update on this?

evedon commented 4 years ago

This issue seems to also affect cortex M4 (STM32F303 for example). See #12880

kjbracey commented 4 years ago

A fix for this is now expected in a future ARM Compiler 6 update.

Current expectation is that future microlib will have the same smaller+faster 64-bit multiply as the standard library. No Mbed OS change required there.

It will default to the the current division operations, but the faster+bigger versions from the standard library will be selectable by an IMPORT directive, which we'll have to add. That in turn could be put under a Mbed config option. So there will need to be a PR for that when the toolchain is available.

ciarmcom commented 4 years ago

@LMESTM thank you for raising this issue.Please take a look at the following comments:

We cannot automatically identify a release based on the version of Mbed OS that you have provided. Please provide either a single valid sha of the form #abcde12 or #3b8265d70af32261311a06e423ca33434d8d80de or a single valid release tag of the form mbed-os-x.y.z . E.g. '2020.' has not been matched as a valid tag or sha. NOTE: If there are fields which are not applicable then please just add 'n/a' or 'None'.This indicates to us that at least all the fields have been considered. Please update the issue header with the missing information, the issue will not be mirroredto our internal defect tracking system or investigated until this has been fully resolved.

adbridge commented 4 years ago

@LMESTM as this appears to be more of a compiler issue can we close this now ?

ciarmcom commented 4 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

0xc0170 commented 4 years ago

The latest release of Arm Compiler is 6.15 (this versin contains the fix). I reviewed the internal tickets, it should be in the release.

I'll close this now as resolved.

kjbracey commented 4 years ago

AFAIK, you will need to actually activate the fast division routines, by adding some sort of import directive somewhere. The libraries will default to slow.

(And there might be toolchain-backward-compatibility issues with that. Maybe you can make it a weak reference to not break old toolchains?)

From my comment above:

It will default to the the current division operations, but the faster+bigger versions from the standard library will be selectable by an IMPORT directive, which we'll have to add. That in turn could be put under a Mbed config option. So there will need to be a PR for that when the toolchain is available.

ciarmcom commented 4 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

0xc0170 commented 4 years ago

We will need to review the latest release and how to make them enabled.

ciarmcom commented 4 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 4 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

@LMESTM it has been 5 days since the last reminder. Could you please update the issue header as previously requested?

ciarmcom commented 3 years ago

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers. Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2945

ciarmcom commented 3 years ago

We closed this issue because it has been inactive for quite some time and we believe it to be low priority. If you think that the priority should be higher, then please reopen with your justification for increasing the priority.

ARMmbed / mbed-os