STMicroelectronics / STM32CubeF4

STM32Cube MCU Full Package for the STM32F4 series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on all boards provided by ST (Nucleo, Evaluation and Discovery Kits))
Other
871 stars 418 forks source link

Unnecessary 64-bit calculations result in 700+ bytes increase in firmware size #163

Open yakov-bakhmatov opened 1 year ago

yakov-bakhmatov commented 1 year ago

Function uint32_t HAL_RCC_GetSysClockFreq(void) calculates expression PLL_VCO = (HSE_VALUE or HSI_VALUE / PLLM) * PLLN using 64-bit multiplication and division.

https://github.com/STMicroelectronics/STM32CubeF4/blob/d5af56388ff037735ac99de39abf2b46f9921aa3/Drivers/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_rcc.c#L905-L920

This forces the compiler (in particular gcc) to link to an external __aeabi_uldivmod function that performs a 64-bit division.

But to calculate the expression a * b / c, where a, b and c are uint32_t and the result is also 32 bits, it is possible without expanding to 64 bits.

Let a = m * c + n, b = p * c + q. Then

a * b / c = (m * c + n) * (p * c + q) / c =
  (m * p * c * c + m * q * c + n * p * c + n * q) / c =
  m * p * c + m * q + n * p + n * q / c

Define the auxiliary function:

static uint32_t muldiv(uint32_t a, uint32_t b, uint32_t c) {
    uint32_t m = a / c;
    uint32_t n = a % c;
    uint32_t p = b / c;
    uint32_t q = b % c;
    return m * p * c + m * q + n * p + n * q / c;
}

Expressions in lines 911, 916 are converted to the following

-        pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSE_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);
-        pllvco = (uint32_t) ((((uint64_t) HSI_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);
+        pllvco = muldiv(HSI_VALUE, (RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos, pllm);

How this change affects the size of the binary.

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.

arm-none-eabi-size build/stm32f407-empty.elf

   text    data     bss     dec     hex filename
   3668      20    1572    5260    148c build/stm32f407-empty.elf

Using the muldiv function:

arm-none-eabi-size build/stm32f407-empty.elf

   text    data     bss     dec     hex filename
   2892      20    1572    4484    1184 build/stm32f407-empty.elf

The difference in binary size is 776 bytes.

TOUNSTM commented 3 months ago

See Also https://github.com/STMicroelectronics/stm32l0xx_hal_driver/issues/12

bmcdonnell-fb commented 3 months ago

For example, create an empty Makefile project in CubeMX for MCU STM32F407 and compile it by arm-gnu-toolchain-12.2.

arm-none-eabi-size build/stm32f407-empty.elf

   text      data     bss     dec     hex filename
   3668        20    1572    5260    148c build/stm32f407-empty.elf

Using the muldiv function:

arm-none-eabi-size build/stm32f407-empty.elf

   text      data     bss     dec     hex filename
   2892        20    1572    4484    1184 build/stm32f407-empty.elf

The difference in binary size is 776 bytes.

What if you do some uint64_t calculation somewhere else in your application?