embench / embench-iot

The main Embench repository
https://www.embench.org/
GNU General Public License v3.0
254 stars 104 forks source link

Adding const function attribute to sqrt function in math.h will reduce the execution time for 'nbody' benchmark significantly #154

Open TilakGirijeswar opened 2 years ago

TilakGirijeswar commented 2 years ago

I came across a problem while running Embench for core performance with XC32 4.00 and ARM GNU GCC 10.3.

Environment to replicate the issue:

  1. MPLAB X IDE https://www.microchip.com/en-us/tools-resources/develop/mplab-x-ide#tabs
  2. ARM GNU GCC v10.3.1 https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads
  3. SAML21 Xplained PRO board https://www.microchip.com/en-us/development-tool/atsaml21-xpro-b

Attached .zip file Embench_Core_speed_nbody_issue.zip contains an MPLAB X project of Embench with 'nbody' as source. Download the required tools to replicate the issue, SAML21 board is required to check the code speed. If SAML21 board is not available, project needs to ported other ARM based MCU and add attribute((const)) to sqrt function in math.h: extern double sqrt (double) __attribute__((const));

To replicate the issue with MPLAB X IDE and SAML21 board, follow these steps:

  1. Open project in MPLAB X IDE
  2. Set configuration as ‘GCC’
  3. Open project properties -> select ARM GNU GCC as compiler toolchain and select SAML21 connected as Hardware tool, and click on Ok
  4. Clean and build your project with GCC 10.3.1
  5. Enter debug mode to pause at main
  6. Add a breakpoint at line 35 in main.c
  7. Run till breakpoint
  8. Add a variable ‘Time_ms’ to watch window and note the value

    'Time_ms' is time in milli seconds captured from inbuilt timer to execute the benchmark which is tracked from start_trigger to stop_trigger. You can see that value in Time_ms is '7602'. This is as expected.

Go to math.h in ARM GNU GCC 10.3.1 installed directory "\GNU Tools ARM Embedded\10 2021.10\arm-none-eabi\include" and add const function attribute to sqrt(): extern double sqrt (double) __attribute__((const));

Clean and build the project and repeat the above steps to capture the time taken again when const function attribute is added to sqrt(). You can see that time taken drastically reduces to '65ms '. This is very less as benchmark body is completely optimized away. By adding the const attribute to sqrt(), the compiler can eliminate redundant calls to sqrt() and this optimizes the complete workload.

XC32 4.00 has const function attribute by default to sqrt and this is a valid usage.

Having const attribute to sqrt will bring the excution time from 7602ms to 65ms which is not a good data to be used for benchmarking. So 'nbody' benchmark source needs to be fixed to avoid this complete optimization reduction.