Closed kaidokert closed 8 years ago
Just out of curiosity, after replacing __udivsi3 with a version borrowed from RT-Thread project, here are the small binary blinky sizes with different compiler versions:
gcc-4.7 1504 bytes gcc-4.8 1284 gcc-4.9 1272 gcc-5.4 1304
For a 8kb and 16kb flash size chips this might sometimes even matter.
Just by adding this at the end of blink.cpp: extern "C" int atexit (void (*)(void)) { return 0;} The binary size immediately shrinks down to 1.5Kb.
I think we can stub atexit
with a wrap on all builds, not just small. That's a very useful space saving.
Just out of curiosity, after replacing __udivsi3 with a version borrowed from RT-Thread project, here are the small binary blinky sizes with different compiler versions:
Another good find. This one only affects the M0 as the M3 and M4 have udiv
instructions. Efficient div implementations also exist here in the chromium project. How do they compare to the routines that you found?
I adapted from here but i didn't even have a chance to test if it works correct. Its C though, so less overhead to maintain. Likewise for a smaller version of memset() that only gets pulled in by initialization routines. But memset is probably not worth replacing as any project of any usefulness will end up needing one anyway, and then you want the libgcc built-in one for speed.
I also did a dirty local hack to GPIOWrite, and it reduces the size as well, but more importantly it makes IO toggles single-instruction, inlined, thanks to compiler optimizations. See patch attached here. Its ugly like that, and i copied 4 lines of ST code, which i'm not sure is decent
I'm trying for a sub-kilobyte blinky here :)
The LTO options are particularly useful for dealing with inefficiencies in the standard peripheral library. For example, tiny functions like this are very common:
void GPIO_Write(GPIO_TypeDef* GPIOx, uint16_t PortVal)
{
/* Check the parameters */
assert_param(IS_GPIO_ALL_PERIPH(GPIOx));
GPIOx->ODR = PortVal;
}
Without LTO this is embedded into a compilation unit and the only thing that callers know about it is the function signature in the header file so the compiler has no choice but to generate a standard method call. With LTO the optimiser can inline calls like this saving a considerable overhead!
I've integrated the chromium implementation of the udiv
instructions into the build for the F0 family. That gives a tidy little speed and size improvement. I've tested it on the F0 and also ensured that the other families are unaffected.
Here's the binary sizes before and after.
Before:
-rwxrwxr-x+ 1 Andy Users 21113 Nov 12 12:18 ./examples/adc_analog_watchdog/build/small-f051-8000000e/adc_analog_watchdog.hex*
-rwxrwxr-x+ 1 Andy Users 18720 Nov 12 12:18 ./examples/adc_single/build/small-f051-8000000e/adc_single.hex*
-rwxrwxr-x+ 1 Andy Users 22381 Nov 12 12:18 ./examples/adc_single_dma_multichan/build/small-f051-8000000e/adc_single_dma_multichan.hex*
-rwxrwxr-x+ 1 Andy Users 20004 Nov 12 12:18 ./examples/adc_single_interrupts/build/small-f051-8000000e/adc_single_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 20798 Nov 12 12:18 ./examples/adc_single_timer_interrupts/build/small-f051-8000000e/adc_single_timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 3534 Nov 12 12:18 ./examples/blink/build/small-f051-8000000e/blink.hex*
-rwxrwxr-x+ 1 Andy Users 4262 Nov 12 12:18 ./examples/button/build/small-f051-8000000e/button.hex*
-rwxrwxr-x+ 1 Andy Users 15030 Nov 12 12:18 ./examples/crc/build/small-f051-8000000e/crc.hex*
-rwxrwxr-x+ 1 Andy Users 9262 Nov 12 12:18 ./examples/debug_semihosting/build/small-f051-8000000e/debug_semihosting.hex*
-rwxrwxr-x+ 1 Andy Users 16621 Nov 12 12:18 ./examples/dma_copy/build/small-f051-8000000e/dma_copy.hex*
-rwxrwxr-x+ 1 Andy Users 4209 Nov 12 12:18 ./examples/dma_fill/build/small-f051-8000000e/dma_fill.hex*
-rwxrwxr-x+ 1 Andy Users 17955 Nov 12 12:18 ./examples/exti/build/small-f051-8000000e/exti.hex*
-rwxrwxr-x+ 1 Andy Users 11041 Nov 12 12:18 ./examples/flash_internal_settings/build/small-f051-8000000e/flash_internal_settings.hex*
-rwxrwxr-x+ 1 Andy Users 20250 Nov 12 12:18 ./examples/hd44780/build/small-f051-8000000e/hd44780.hex*
-rwxrwxr-x+ 1 Andy Users 17881 Nov 12 12:18 ./examples/i2c_at24c32/build/small-f051-8000000e/i2c_at24c32.hex*
-rwxrwxr-x+ 1 Andy Users 18855 Nov 12 12:18 ./examples/power/build/small-f051-8000000e/power.hex*
-rwxrwxr-x+ 1 Andy Users 128102 Nov 12 12:18 ./examples/r61523_f051/build/small-f051-8000000e/r61523_f051.hex*
-rwxrwxr-x+ 1 Andy Users 22614 Nov 12 12:18 ./examples/rtc/build/small-f051-8000000e/rtc.hex*
-rwxrwxr-x+ 1 Andy Users 7126 Nov 12 12:18 ./examples/spi_send_dma/build/small-f051-8000000e/spi_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users 18961 Nov 12 12:18 ./examples/spi_send_interrupts/build/small-f051-8000000e/spi_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 6181 Nov 12 12:18 ./examples/spi_send_sync/build/small-f051-8000000e/spi_send_sync.hex*
-rwxrwxr-x+ 1 Andy Users 16801 Nov 12 12:18 ./examples/timer_dma_pwm/build/small-f051-8000000e/timer_dma_pwm.hex*
-rwxrwxr-x+ 1 Andy Users 5567 Nov 12 12:18 ./examples/timer_dma_usart/build/small-f051-8000000e/timer_dma_usart.hex*
-rwxrwxr-x+ 1 Andy Users 4479 Nov 12 12:18 ./examples/timer_dual_gpio_out/build/small-f051-8000000e/timer_dual_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 5866 Nov 12 12:18 ./examples/timer_dual_pwm_gpio_out/build/small-f051-8000000e/timer_dual_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 24713 Nov 12 12:18 ./examples/timer_encoder/build/small-f051-8000000e/timer_encoder.hex*
-rwxrwxr-x+ 1 Andy Users 4299 Nov 12 12:18 ./examples/timer_gpio_out/build/small-f051-8000000e/timer_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 22344 Nov 12 12:18 ./examples/timer_input_capture/build/small-f051-8000000e/timer_input_capture.hex*
-rwxrwxr-x+ 1 Andy Users 18225 Nov 12 12:18 ./examples/timer_interrupts/build/small-f051-8000000e/timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 6398 Nov 12 12:18 ./examples/timer_master_slave/build/small-f051-8000000e/timer_master_slave.hex*
-rwxrwxr-x+ 1 Andy Users 19644 Nov 12 12:18 ./examples/timer_pwm_break/build/small-f051-8000000e/timer_pwm_break.hex*
-rwxrwxr-x+ 1 Andy Users 4802 Nov 12 12:18 ./examples/timer_pwm_gpio_out/build/small-f051-8000000e/timer_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 6128 Nov 12 12:18 ./examples/usart_receive_dma/build/small-f051-8000000e/usart_receive_dma.hex*
-rwxrwxr-x+ 1 Andy Users 18691 Nov 12 12:18 ./examples/usart_receive_interrupts/build/small-f051-8000000e/usart_receive_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 13950 Nov 12 12:18 ./examples/usart_receive_sync/build/small-f051-8000000e/usart_receive_sync.hex*
-rwxrwxr-x+ 1 Andy Users 5416 Nov 12 12:18 ./examples/usart_send_dma/build/small-f051-8000000e/usart_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users 19485 Nov 12 12:18 ./examples/usart_send_dma_interrupts/build/small-f051-8000000e/usart_send_dma_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 18241 Nov 12 12:18 ./examples/usart_send_interrupts/build/small-f051-8000000e/usart_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 13238 Nov 12 12:18 ./examples/usart_send_sync/build/small-f051-8000000e/usart_send_sync.hex*
After:
-rwxrwxr-x+ 1 Andy Users 20475 Nov 12 12:14 ./examples/adc_analog_watchdog/build/small-f051-8000000e/adc_analog_watchdog.hex*
-rwxrwxr-x+ 1 Andy Users 18069 Nov 12 12:14 ./examples/adc_single/build/small-f051-8000000e/adc_single.hex*
-rwxrwxr-x+ 1 Andy Users 20520 Nov 12 12:14 ./examples/adc_single_dma_multichan/build/small-f051-8000000e/adc_single_dma_multichan.hex*
-rwxrwxr-x+ 1 Andy Users 19366 Nov 12 12:14 ./examples/adc_single_interrupts/build/small-f051-8000000e/adc_single_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 20160 Nov 12 12:14 ./examples/adc_single_timer_interrupts/build/small-f051-8000000e/adc_single_timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 3534 Nov 12 12:14 ./examples/blink/build/small-f051-8000000e/blink.hex*
-rwxrwxr-x+ 1 Andy Users 4262 Nov 12 12:14 ./examples/button/build/small-f051-8000000e/button.hex*
-rwxrwxr-x+ 1 Andy Users 14379 Nov 12 12:14 ./examples/crc/build/small-f051-8000000e/crc.hex*
-rwxrwxr-x+ 1 Andy Users 8161 Nov 12 12:14 ./examples/debug_semihosting/build/small-f051-8000000e/debug_semihosting.hex*
-rwxrwxr-x+ 1 Andy Users 16621 Nov 12 12:15 ./examples/dma_copy/build/small-f051-8000000e/dma_copy.hex*
-rwxrwxr-x+ 1 Andy Users 4209 Nov 12 12:14 ./examples/dma_fill/build/small-f051-8000000e/dma_fill.hex*
-rwxrwxr-x+ 1 Andy Users 17955 Nov 12 12:15 ./examples/exti/build/small-f051-8000000e/exti.hex*
-rwxrwxr-x+ 1 Andy Users 10403 Nov 12 12:15 ./examples/flash_internal_settings/build/small-f051-8000000e/flash_internal_settings.hex*
-rwxrwxr-x+ 1 Andy Users 20250 Nov 12 12:15 ./examples/hd44780/build/small-f051-8000000e/hd44780.hex*
-rwxrwxr-x+ 1 Andy Users 17243 Nov 12 12:15 ./examples/i2c_at24c32/build/small-f051-8000000e/i2c_at24c32.hex*
-rwxrwxr-x+ 1 Andy Users 18855 Nov 12 12:15 ./examples/power/build/small-f051-8000000e/power.hex*
-rwxrwxr-x+ 1 Andy Users 126376 Nov 12 12:15 ./examples/r61523_f051/build/small-f051-8000000e/r61523_f051.hex*
-rwxrwxr-x+ 1 Andy Users 21976 Nov 12 12:15 ./examples/rtc/build/small-f051-8000000e/rtc.hex*
-rwxrwxr-x+ 1 Andy Users 7126 Nov 12 12:15 ./examples/spi_send_dma/build/small-f051-8000000e/spi_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users 18961 Nov 12 12:15 ./examples/spi_send_interrupts/build/small-f051-8000000e/spi_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 6181 Nov 12 12:15 ./examples/spi_send_sync/build/small-f051-8000000e/spi_send_sync.hex*
-rwxrwxr-x+ 1 Andy Users 16163 Nov 12 12:15 ./examples/timer_dma_pwm/build/small-f051-8000000e/timer_dma_pwm.hex*
-rwxrwxr-x+ 1 Andy Users 4929 Nov 12 12:15 ./examples/timer_dma_usart/build/small-f051-8000000e/timer_dma_usart.hex*
-rwxrwxr-x+ 1 Andy Users 3841 Nov 12 12:15 ./examples/timer_dual_gpio_out/build/small-f051-8000000e/timer_dual_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 5228 Nov 12 12:15 ./examples/timer_dual_pwm_gpio_out/build/small-f051-8000000e/timer_dual_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 24075 Nov 12 12:15 ./examples/timer_encoder/build/small-f051-8000000e/timer_encoder.hex*
-rwxrwxr-x+ 1 Andy Users 3661 Nov 12 12:15 ./examples/timer_gpio_out/build/small-f051-8000000e/timer_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 21706 Nov 12 12:15 ./examples/timer_input_capture/build/small-f051-8000000e/timer_input_capture.hex*
-rwxrwxr-x+ 1 Andy Users 17574 Nov 12 12:15 ./examples/timer_interrupts/build/small-f051-8000000e/timer_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 5747 Nov 12 12:15 ./examples/timer_master_slave/build/small-f051-8000000e/timer_master_slave.hex*
-rwxrwxr-x+ 1 Andy Users 19006 Nov 12 12:15 ./examples/timer_pwm_break/build/small-f051-8000000e/timer_pwm_break.hex*
-rwxrwxr-x+ 1 Andy Users 4164 Nov 12 12:15 ./examples/timer_pwm_gpio_out/build/small-f051-8000000e/timer_pwm_gpio_out.hex*
-rwxrwxr-x+ 1 Andy Users 5477 Nov 12 12:15 ./examples/usart_receive_dma/build/small-f051-8000000e/usart_receive_dma.hex*
-rwxrwxr-x+ 1 Andy Users 18053 Nov 12 12:15 ./examples/usart_receive_interrupts/build/small-f051-8000000e/usart_receive_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 13299 Nov 12 12:15 ./examples/usart_receive_sync/build/small-f051-8000000e/usart_receive_sync.hex*
-rwxrwxr-x+ 1 Andy Users 4778 Nov 12 12:15 ./examples/usart_send_dma/build/small-f051-8000000e/usart_send_dma.hex*
-rwxrwxr-x+ 1 Andy Users 18834 Nov 12 12:15 ./examples/usart_send_dma_interrupts/build/small-f051-8000000e/usart_send_dma_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 17603 Nov 12 12:15 ./examples/usart_send_interrupts/build/small-f051-8000000e/usart_send_interrupts.hex*
-rwxrwxr-x+ 1 Andy Users 12600 Nov 12 12:15 ./examples/usart_send_sync/build/small-f051-8000000e/usart_send_sync.hex*
I think these optimisations deserve a library release, I'll create one now.
This is more of a question than an issue: close if not appropriate.
I was testing how small binaries can i get out of the box, and compiled with
scons mode=small mcu=f030 hse=8000000
The smallest example, blink.bin is at 5.8K by default.
ls -laSh examples/blink/build/small-f030-8000000e/blink.bin
Checking the contents :
arm-none-eabi-nm -C -S --size-sort examples/blink/build/small-f030-8000000e/blink.elf
shows that malloc and related functionality is pulled in by default. Which, in return is triggered by crt0 calling atexit, and atexit doing a malloc call for whatever reason, which i suppose might be reasonable behaviour when running a full OS.
Just by adding this at the end of blink.cpp:
extern "C" int atexit (void (*)(void)) { return 0;}
The binary size immediately shrinks down to 1.5Kb. This was tested on gcc-4.9, and gcc-5.4 from https://launchpad.net/gcc-arm-embeddedShould a similar hook be put into mode=small builds by default, unless heap is actually needed ? I think it should be possible to do this with linker option by redirecting this symbol to __wrap_atexit provided somewhere in the library, and compiling with
-Wl,-u atexit -Wl,--wrap=atexit
After stubbing out atexit, next biggest remaining symbol is __udivsi, which also appears to have smaller alternatives, and is dragged in from this line:
SysTick_Config(SystemCoreClock / 1000);
which turns into 64bit div