andysworkshop / stm32plus

The C++ library for the STM32 F0, F100, F103, F107 and F4 microcontrollers
http://www.andybrown.me.uk
Other
745 stars 224 forks source link

Default linkage of malloc() and binary size #189

Closed kaidokert closed 8 years ago

kaidokert commented 8 years ago

This is more of a question than an issue: close if not appropriate.

I was testing how small binaries can i get out of the box, and compiled with

scons mode=small mcu=f030 hse=8000000

The smallest example, blink.bin is at 5.8K by default. ls -laSh examples/blink/build/small-f030-8000000e/blink.bin

Checking the contents :

arm-none-eabi-nm -C -S --size-sort examples/blink/build/small-f030-8000000e/blink.elf

shows that malloc and related functionality is pulled in by default. Which, in return is triggered by crt0 calling atexit, and atexit doing a malloc call for whatever reason, which i suppose might be reasonable behaviour when running a full OS.

Just by adding this at the end of blink.cpp: extern "C" int atexit (void (*)(void)) { return 0;} The binary size immediately shrinks down to 1.5Kb. This was tested on gcc-4.9, and gcc-5.4 from https://launchpad.net/gcc-arm-embedded

Should a similar hook be put into mode=small builds by default, unless heap is actually needed ? I think it should be possible to do this with linker option by redirecting this symbol to __wrap_atexit provided somewhere in the library, and compiling with -Wl,-u atexit -Wl,--wrap=atexit

After stubbing out atexit, next biggest remaining symbol is __udivsi, which also appears to have smaller alternatives, and is dragged in from this line: SysTick_Config(SystemCoreClock / 1000); which turns into 64bit div

kaidokert commented 8 years ago

Just out of curiosity, after replacing __udivsi3 with a version borrowed from RT-Thread project, here are the small binary blinky sizes with different compiler versions:

gcc-4.7 1504 bytes gcc-4.8 1284 gcc-4.9 1272 gcc-5.4 1304

For a 8kb and 16kb flash size chips this might sometimes even matter.

andysworkshop commented 8 years ago

Just by adding this at the end of blink.cpp: extern "C" int atexit (void (*)(void)) { return 0;} The binary size immediately shrinks down to 1.5Kb.

I think we can stub atexit with a wrap on all builds, not just small. That's a very useful space saving.

andysworkshop commented 8 years ago

Just out of curiosity, after replacing __udivsi3 with a version borrowed from RT-Thread project, here are the small binary blinky sizes with different compiler versions:

Another good find. This one only affects the M0 as the M3 and M4 have udiv instructions. Efficient div implementations also exist here in the chromium project. How do they compare to the routines that you found?

kaidokert commented 8 years ago

I adapted from here but i didn't even have a chance to test if it works correct. Its C though, so less overhead to maintain. Likewise for a smaller version of memset() that only gets pulled in by initialization routines. But memset is probably not worth replacing as any project of any usefulness will end up needing one anyway, and then you want the libgcc built-in one for speed.

I also did a dirty local hack to GPIOWrite, and it reduces the size as well, but more importantly it makes IO toggles single-instruction, inlined, thanks to compiler optimizations. See patch attached here. Its ugly like that, and i copied 4 lines of ST code, which i'm not sure is decent

gpiowrite.diff.txt

I'm trying for a sub-kilobyte blinky here :)

andysworkshop commented 8 years ago

The LTO options are particularly useful for dealing with inefficiencies in the standard peripheral library. For example, tiny functions like this are very common:

void GPIO_Write(GPIO_TypeDef* GPIOx, uint16_t PortVal)
{
  /* Check the parameters */
  assert_param(IS_GPIO_ALL_PERIPH(GPIOx));

  GPIOx->ODR = PortVal;
}

Without LTO this is embedded into a compilation unit and the only thing that callers know about it is the function signature in the header file so the compiler has no choice but to generate a standard method call. With LTO the optimiser can inline calls like this saving a considerable overhead!

andysworkshop commented 8 years ago

I've integrated the chromium implementation of the udiv instructions into the build for the F0 family. That gives a tidy little speed and size improvement. I've tested it on the F0 and also ensured that the other families are unaffected.

Here's the binary sizes before and after.

Before:

-rwxrwxr-x+ 1 Andy Users  21113 Nov 12 12:18 ./examples/adc_analog_watchdog/build/small-f051-8000000e/adc_analog_watchdog.hex*
-rwxrwxr-x+ 1 Andy Users  18720 Nov 12 12:18 ./examples/adc_single/build/small-f051-8000000e/adc_single.hex*
-rwxrwxr-x+ 1 Andy Users  22381 Nov 12 12:18 ./examples/adc_single_dma_multichan/build/small-f051-8000000e/adc_single_dma_multichan.hex*
-rwxrwxr-x+ 1 Andy Users  20004 Nov 12 12:18 ./examples/adc_single_interrupts/build/small-f051-8000000e/adc_single_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  20798 Nov 12 12:18 ./examples/adc_single_timer_interrupts/build/small-f051-8000000e/adc_single_timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   3534 Nov 12 12:18 ./examples/blink/build/small-f051-8000000e/blink.hex*
-rwxrwxr-x+ 1 Andy Users   4262 Nov 12 12:18 ./examples/button/build/small-f051-8000000e/button.hex*
-rwxrwxr-x+ 1 Andy Users  15030 Nov 12 12:18 ./examples/crc/build/small-f051-8000000e/crc.hex*
-rwxrwxr-x+ 1 Andy Users   9262 Nov 12 12:18 ./examples/debug_semihosting/build/small-f051-8000000e/debug_semihosting.hex*
-rwxrwxr-x+ 1 Andy Users  16621 Nov 12 12:18 ./examples/dma_copy/build/small-f051-8000000e/dma_copy.hex*
-rwxrwxr-x+ 1 Andy Users   4209 Nov 12 12:18 ./examples/dma_fill/build/small-f051-8000000e/dma_fill.hex*
-rwxrwxr-x+ 1 Andy Users  17955 Nov 12 12:18 ./examples/exti/build/small-f051-8000000e/exti.hex*
-rwxrwxr-x+ 1 Andy Users  11041 Nov 12 12:18 ./examples/flash_internal_settings/build/small-f051-8000000e/flash_internal_settings.hex*
-rwxrwxr-x+ 1 Andy Users  20250 Nov 12 12:18 ./examples/hd44780/build/small-f051-8000000e/hd44780.hex*
-rwxrwxr-x+ 1 Andy Users  17881 Nov 12 12:18 ./examples/i2c_at24c32/build/small-f051-8000000e/i2c_at24c32.hex*
-rwxrwxr-x+ 1 Andy Users  18855 Nov 12 12:18 ./examples/power/build/small-f051-8000000e/power.hex*
-rwxrwxr-x+ 1 Andy Users 128102 Nov 12 12:18 ./examples/r61523_f051/build/small-f051-8000000e/r61523_f051.hex*
-rwxrwxr-x+ 1 Andy Users  22614 Nov 12 12:18 ./examples/rtc/build/small-f051-8000000e/rtc.hex*
-rwxrwxr-x+ 1 Andy Users   7126 Nov 12 12:18 ./examples/spi_send_dma/build/small-f051-8000000e/spi_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users  18961 Nov 12 12:18 ./examples/spi_send_interrupts/build/small-f051-8000000e/spi_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   6181 Nov 12 12:18 ./examples/spi_send_sync/build/small-f051-8000000e/spi_send_sync.hex*
-rwxrwxr-x+ 1 Andy Users  16801 Nov 12 12:18 ./examples/timer_dma_pwm/build/small-f051-8000000e/timer_dma_pwm.hex*
-rwxrwxr-x+ 1 Andy Users   5567 Nov 12 12:18 ./examples/timer_dma_usart/build/small-f051-8000000e/timer_dma_usart.hex*
-rwxrwxr-x+ 1 Andy Users   4479 Nov 12 12:18 ./examples/timer_dual_gpio_out/build/small-f051-8000000e/timer_dual_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users   5866 Nov 12 12:18 ./examples/timer_dual_pwm_gpio_out/build/small-f051-8000000e/timer_dual_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users  24713 Nov 12 12:18 ./examples/timer_encoder/build/small-f051-8000000e/timer_encoder.hex*
-rwxrwxr-x+ 1 Andy Users   4299 Nov 12 12:18 ./examples/timer_gpio_out/build/small-f051-8000000e/timer_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users  22344 Nov 12 12:18 ./examples/timer_input_capture/build/small-f051-8000000e/timer_input_capture.hex*
-rwxrwxr-x+ 1 Andy Users  18225 Nov 12 12:18 ./examples/timer_interrupts/build/small-f051-8000000e/timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   6398 Nov 12 12:18 ./examples/timer_master_slave/build/small-f051-8000000e/timer_master_slave.hex*
-rwxrwxr-x+ 1 Andy Users  19644 Nov 12 12:18 ./examples/timer_pwm_break/build/small-f051-8000000e/timer_pwm_break.hex*
-rwxrwxr-x+ 1 Andy Users   4802 Nov 12 12:18 ./examples/timer_pwm_gpio_out/build/small-f051-8000000e/timer_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users   6128 Nov 12 12:18 ./examples/usart_receive_dma/build/small-f051-8000000e/usart_receive_dma.hex*
-rwxrwxr-x+ 1 Andy Users  18691 Nov 12 12:18 ./examples/usart_receive_interrupts/build/small-f051-8000000e/usart_receive_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  13950 Nov 12 12:18 ./examples/usart_receive_sync/build/small-f051-8000000e/usart_receive_sync.hex*
-rwxrwxr-x+ 1 Andy Users   5416 Nov 12 12:18 ./examples/usart_send_dma/build/small-f051-8000000e/usart_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users  19485 Nov 12 12:18 ./examples/usart_send_dma_interrupts/build/small-f051-8000000e/usart_send_dma_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  18241 Nov 12 12:18 ./examples/usart_send_interrupts/build/small-f051-8000000e/usart_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  13238 Nov 12 12:18 ./examples/usart_send_sync/build/small-f051-8000000e/usart_send_sync.hex*

After:

-rwxrwxr-x+ 1 Andy Users  20475 Nov 12 12:14 ./examples/adc_analog_watchdog/build/small-f051-8000000e/adc_analog_watchdog.hex*
-rwxrwxr-x+ 1 Andy Users  18069 Nov 12 12:14 ./examples/adc_single/build/small-f051-8000000e/adc_single.hex*
-rwxrwxr-x+ 1 Andy Users  20520 Nov 12 12:14 ./examples/adc_single_dma_multichan/build/small-f051-8000000e/adc_single_dma_multichan.hex*
-rwxrwxr-x+ 1 Andy Users  19366 Nov 12 12:14 ./examples/adc_single_interrupts/build/small-f051-8000000e/adc_single_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  20160 Nov 12 12:14 ./examples/adc_single_timer_interrupts/build/small-f051-8000000e/adc_single_timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   3534 Nov 12 12:14 ./examples/blink/build/small-f051-8000000e/blink.hex*
-rwxrwxr-x+ 1 Andy Users   4262 Nov 12 12:14 ./examples/button/build/small-f051-8000000e/button.hex*
-rwxrwxr-x+ 1 Andy Users  14379 Nov 12 12:14 ./examples/crc/build/small-f051-8000000e/crc.hex*
-rwxrwxr-x+ 1 Andy Users   8161 Nov 12 12:14 ./examples/debug_semihosting/build/small-f051-8000000e/debug_semihosting.hex*
-rwxrwxr-x+ 1 Andy Users  16621 Nov 12 12:15 ./examples/dma_copy/build/small-f051-8000000e/dma_copy.hex*
-rwxrwxr-x+ 1 Andy Users   4209 Nov 12 12:14 ./examples/dma_fill/build/small-f051-8000000e/dma_fill.hex*
-rwxrwxr-x+ 1 Andy Users  17955 Nov 12 12:15 ./examples/exti/build/small-f051-8000000e/exti.hex*
-rwxrwxr-x+ 1 Andy Users  10403 Nov 12 12:15 ./examples/flash_internal_settings/build/small-f051-8000000e/flash_internal_settings.hex*
-rwxrwxr-x+ 1 Andy Users  20250 Nov 12 12:15 ./examples/hd44780/build/small-f051-8000000e/hd44780.hex*
-rwxrwxr-x+ 1 Andy Users  17243 Nov 12 12:15 ./examples/i2c_at24c32/build/small-f051-8000000e/i2c_at24c32.hex*
-rwxrwxr-x+ 1 Andy Users  18855 Nov 12 12:15 ./examples/power/build/small-f051-8000000e/power.hex*
-rwxrwxr-x+ 1 Andy Users 126376 Nov 12 12:15 ./examples/r61523_f051/build/small-f051-8000000e/r61523_f051.hex*
-rwxrwxr-x+ 1 Andy Users  21976 Nov 12 12:15 ./examples/rtc/build/small-f051-8000000e/rtc.hex*
-rwxrwxr-x+ 1 Andy Users   7126 Nov 12 12:15 ./examples/spi_send_dma/build/small-f051-8000000e/spi_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users  18961 Nov 12 12:15 ./examples/spi_send_interrupts/build/small-f051-8000000e/spi_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   6181 Nov 12 12:15 ./examples/spi_send_sync/build/small-f051-8000000e/spi_send_sync.hex*
-rwxrwxr-x+ 1 Andy Users  16163 Nov 12 12:15 ./examples/timer_dma_pwm/build/small-f051-8000000e/timer_dma_pwm.hex*
-rwxrwxr-x+ 1 Andy Users   4929 Nov 12 12:15 ./examples/timer_dma_usart/build/small-f051-8000000e/timer_dma_usart.hex*
-rwxrwxr-x+ 1 Andy Users   3841 Nov 12 12:15 ./examples/timer_dual_gpio_out/build/small-f051-8000000e/timer_dual_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users   5228 Nov 12 12:15 ./examples/timer_dual_pwm_gpio_out/build/small-f051-8000000e/timer_dual_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users  24075 Nov 12 12:15 ./examples/timer_encoder/build/small-f051-8000000e/timer_encoder.hex*
-rwxrwxr-x+ 1 Andy Users   3661 Nov 12 12:15 ./examples/timer_gpio_out/build/small-f051-8000000e/timer_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users  21706 Nov 12 12:15 ./examples/timer_input_capture/build/small-f051-8000000e/timer_input_capture.hex*
-rwxrwxr-x+ 1 Andy Users  17574 Nov 12 12:15 ./examples/timer_interrupts/build/small-f051-8000000e/timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users   5747 Nov 12 12:15 ./examples/timer_master_slave/build/small-f051-8000000e/timer_master_slave.hex*
-rwxrwxr-x+ 1 Andy Users  19006 Nov 12 12:15 ./examples/timer_pwm_break/build/small-f051-8000000e/timer_pwm_break.hex*
-rwxrwxr-x+ 1 Andy Users   4164 Nov 12 12:15 ./examples/timer_pwm_gpio_out/build/small-f051-8000000e/timer_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users   5477 Nov 12 12:15 ./examples/usart_receive_dma/build/small-f051-8000000e/usart_receive_dma.hex*
-rwxrwxr-x+ 1 Andy Users  18053 Nov 12 12:15 ./examples/usart_receive_interrupts/build/small-f051-8000000e/usart_receive_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  13299 Nov 12 12:15 ./examples/usart_receive_sync/build/small-f051-8000000e/usart_receive_sync.hex*
-rwxrwxr-x+ 1 Andy Users   4778 Nov 12 12:15 ./examples/usart_send_dma/build/small-f051-8000000e/usart_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users  18834 Nov 12 12:15 ./examples/usart_send_dma_interrupts/build/small-f051-8000000e/usart_send_dma_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  17603 Nov 12 12:15 ./examples/usart_send_interrupts/build/small-f051-8000000e/usart_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users  12600 Nov 12 12:15 ./examples/usart_send_sync/build/small-f051-8000000e/usart_send_sync.hex*

I think these optimisations deserve a library release, I'll create one now.