Samsung / ONE

On-device Neural Engine
Other
435 stars 157 forks source link

[luci-micro] Remove C++ runtime from micro interpreter #9467

Open binarman opened 2 years ago

binarman commented 2 years ago

Goal

Motivation

C++ libraries could add significant overhead on binary size. This could be an issue for MCU based applications.

binarman commented 2 years ago

related issue #9468

AShedko commented 2 years ago

We can reduce a target binary ( built from https://github.com/BalyshevArtem/ONE/tree/luci_micro_read_from_flesh ) from 534 KiB (= 546172) to 221 KiB (= 226144 bytes), an improvement of 320028:

$bloaty ./compiler/luci-micro/mbed-os/benchmark --  ~/Proj/ONE/bench_rel
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +1.3% +6.25Ki  [ = ]       0    .heap
  -0.7%      -2  [ = ]       0    .shstrtab
 -71.4%     -20 -71.4%     -20    .got
  [ = ]       0 -83.3%     -20    .uninitialized
  -0.1%    -372  [ = ]       0    .debug_macro
 -16.2%    -632 -16.2%    -632    .data
 -64.8% -5.08Ki -64.8% -5.08Ki    .ARM.exidx
  [ = ]       0 -40.0% -5.59Ki    .bss
 -95.6% -14.1Ki -95.6% -14.1Ki    .ARM.extab
 -59.3% -23.6Ki  [ = ]       0    .debug_aranges
 -61.4% -80.4Ki  [ = ]       0    .debug_frame
 -60.9% -83.9Ki  [ = ]       0    .strtab
 -64.2% -97.3Ki  [ = ]       0    .symtab
 -54.5%  -177Ki  [ = ]       0    .debug_abbrev
  -8.9%  -282Ki  [ = ]       0    .debug_str
 -57.7%  -292Ki -57.7%  -292Ki    .text
 -78.9%  -408Ki  [ = ]       0    .debug_ranges
 -53.1%  -742Ki  [ = ]       0    .debug_line
 -75.0% -1.93Mi  [ = ]       0    .debug_loc
 -62.2% -3.07Mi  [ = ]       0    .debug_info
 -48.7% -7.16Mi -38.8%  -318Ki    TOTAL

The required changes are:

Size-reduced std usage:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
   3.1%  21.5Ki  24.5%  3.06Ki    ../../../../../../../../newlib/libc/stdlib/mallocr.c
   2.9%  20.1Ki  12.0%  1.50Ki    ../../../../../../src/libstdc++-v3/libsupc++/vmi_class_type_info.cc
   1.2%  8.25Ki   6.1%     776    ../../../../../../../../newlib/libc/stdio/fvwrite.c
  25.0%   171Ki   5.6%     720    ../../../../../../../src/libstdc++-v3/src/c++11/string-inst.cc
   1.3%  9.16Ki   3.9%     500    ../../../../../../../../newlib/libc/stdio/findfp.c
   1.0%  7.09Ki   3.3%     420    ../../../../../../../../newlib/libc/stdio/fflush.c
   1.2%  8.11Ki   3.2%     404    ../../../../../../src/libstdc++-v3/libsupc++/eh_arm.cc
   2.4%  16.7Ki   2.9%     368    ../../../../../../src/libstdc++-v3/libsupc++/eh_alloc.cc
   2.1%  14.7Ki   2.7%     340    ../../../../../../../src/libstdc++-v3/src/c++11/snprintf_lite.cc
   1.8%  12.5Ki   2.6%     332    ../../../../../../src/libstdc++-v3/libsupc++/eh_call.cc
   1.1%  7.44Ki   2.6%     332    ../../../../../../src/libstdc++-v3/libsupc++/eh_throw.cc
   3.4%  23.6Ki   2.5%     314    ../../../../../../../src/libstdc++-v3/src/c++11/hashtable_c++0x.cc
  19.9%   136Ki   2.3%     288    ../../../../../../../src/libstdc++-v3/src/c++11/cow-string-inst.cc
   1.1%  7.86Ki   2.2%     284    ../../../../../../src/libstdc++-v3/libsupc++/si_class_type_info.cc
   2.4%  16.4Ki   2.1%     272    ../../../../../../../src/libstdc++-v3/src/c++11/functexcept.cc
   4.9%  33.4Ki   2.1%     272    ../../../../../../../src/libstdc++-v3/src/c++98/stdexcept.cc
   1.5%  10.2Ki   2.0%     260    ../../../../../../src/libstdc++-v3/libsupc++/eh_catch.cc
   1.4%  9.38Ki   2.0%     256    ../../../../../../src/libstdc++-v3/libsupc++/class_type_info.cc
   0.9%  6.33Ki   1.9%     240    ../../../../../../../../newlib/libc/stdio/makebuf.c
   0.7%  4.81Ki   1.6%     208    ../../../../../../../../newlib/libc/stdio/wsetup.c
   0.8%  5.26Ki   1.5%     192    ../../../../../../../../newlib/libc/stdio/wbuf.c
   0.9%  6.12Ki   1.4%     180    ../../../../../../../../newlib/libc/stdio/putc.c
   0.8%  5.24Ki   1.3%     160    ../../../../../../../../newlib/libc/stdio/fwalk.c
   0.8%  5.27Ki   1.2%     152    ../../../../../../../../newlib/libc/stdio/fclose.c
   0.9%  6.17Ki   1.1%     144    ../../../../../../../../newlib/libc/stdio/stdio.c
   0.7%  5.08Ki   0.9%     120    ../../../../../../../../newlib/libc/stdio/puts.c
   8.1%  55.5Ki   0.9%     120    ../../../../../../../src/libstdc++-v3/src/c++11/cow-stdexcept.cc
   1.3%  8.86Ki   0.9%     120    ../../../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc
   0.8%  5.44Ki   0.8%     104    ../../../../../../../../newlib/libc/stdio/fputc.c
   0.5%  3.53Ki   0.4%      56    ../../../../../../src/libstdc++-v3/libsupc++/bad_alloc.cc
   0.6%  4.26Ki   0.4%      52    ../../../../../../src/libstdc++-v3/libsupc++/tinfo.cc
   0.9%  6.05Ki   0.3%      40    ../../../../../../src/libstdc++-v3/libsupc++/eh_exception.cc
   0.5%  3.60Ki   0.3%      32    ../../../../../../../../newlib/libc/stdlib/malloc.c
   1.1%  7.36Ki   0.2%      20    ../../../../../../src/libstdc++-v3/libsupc++/eh_globals.cc
   0.5%  3.53Ki   0.1%      16    ../../../../../../../../newlib/libc/stdlib/abort.c
   0.9%  6.39Ki   0.1%       8    ../../../../../../src/libstdc++-v3/libsupc++/eh_term_handler.cc
   0.4%  2.51Ki   0.0%       4    ../../../../../../src/libstdc++-v3/libsupc++/eh_unex_handler.cc
 100.0%   686Ki 100.0%  12.5Ki    TOTAL
Filtering enabled (source_filter); omitted file = 6.88Mi, vm =  489Ki of entries

Old std usage:

bloaty -s vm  ~/Proj/ONE/bench_rel -d compileunits -n 30 --source-filter="std"
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  11.5%   716Ki  19.5%  42.4Ki    ../../../../../../../src/libstdc++-v3/src/c++11/locale-inst.cc
  12.3%   764Ki  18.8%  40.9Ki    ../../../../../../../src/libstdc++-v3/src/c++11/wlocale-inst.cc
  19.5%  1.19Mi   8.3%  18.2Ki    [88 Others]
   7.4%   458Ki   8.2%  17.8Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cxx11-locale-inst.cc
   7.8%   483Ki   7.6%  16.5Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cxx11-wlocale-inst.cc
   5.0%   310Ki   4.7%  10.2Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cow-shim_facets.cc
   4.6%   285Ki   4.3%  9.40Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cxx11-shim_facets.cc
   0.7%  46.3Ki   3.1%  6.80Ki    ../../../../../../../../newlib/libc/stdio/vfscanf.c
   0.7%  45.6Ki   2.8%  6.16Ki    ../../../../../../../../newlib/libc/stdio/vfprintf.c
   1.5%  91.2Ki   2.7%  5.86Ki    ../../../../../../../src/libstdc++-v3/src/c++98/locale_init.cc
   0.5%  32.2Ki   2.6%  5.59Ki    ../../../../../../../../newlib/libc/stdio/vfwprintf.c
   0.5%  28.4Ki   1.9%  4.15Ki    ../../../../../../../../newlib/libc/stdlib/strtod.c
   0.4%  26.7Ki   1.6%  3.47Ki    ../../../../../../../../newlib/libc/stdlib/dtoa.c
   0.3%  21.4Ki   1.4%  3.06Ki    ../../../../../../../../newlib/libc/stdlib/mallocr.c
   0.4%  27.8Ki   1.2%  2.68Ki    ../../../../../../../../newlib/libc/stdlib/mprec.c
   2.1%   133Ki   1.2%  2.64Ki    ../../../../../../../src/libstdc++-v3/src/c++11/codecvt.cc
   3.0%   183Ki   0.9%  2.02Ki    ../../../../../../../src/libstdc++-v3/src/c++11/string-inst.cc
   2.5%   156Ki   0.9%  1.89Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cow-string-inst.cc
   2.4%   151Ki   0.8%  1.84Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cow-wstring-inst.cc
   2.7%   170Ki   0.8%  1.77Ki    ../../../../../../../src/libstdc++-v3/src/c++11/ostream-inst.cc
   0.3%  16.6Ki   0.8%  1.75Ki    ../../../../../../../../newlib/libc/stdlib/gdtoa-gethex.c
   0.4%  26.9Ki   0.7%  1.57Ki    ../../../../../../src/libstdc++-v3/libsupc++/eh_personality.cc
   5.2%   325Ki   0.7%  1.51Ki    ../../../../../../../src/libstdc++-v3/src/c++11/sstream-inst.cc
   0.3%  20.1Ki   0.7%  1.50Ki    ../../../../../../src/libstdc++-v3/libsupc++/vmi_class_type_info.cc
   2.6%   162Ki   0.7%  1.43Ki    ../../../../../../../src/libstdc++-v3/src/c++11/wstring-inst.cc
   1.3%  79.7Ki   0.6%  1.40Ki    ../../../../../../../src/libstdc++-v3/src/c++98/locale.cc
   0.2%  10.5Ki   0.6%  1.36Ki    ../../../../../../../src/libstdc++-v3/src/c++98/globals_io.cc
   1.1%  67.4Ki   0.5%  1.11Ki    ../../../../../../../src/libstdc++-v3/src/c++11/cow-locale_init.cc
   0.6%  36.1Ki   0.5%  1.06Ki    ../../../../../../../src/libstdc++-v3/src/c++98/ios_init.cc
   1.0%  63.0Ki   0.4%     912    ../../../../../../../src/libstdc++-v3/src/c++11/streambuf-inst.cc
   1.0%  64.8Ki   0.4%     900    ../../../../../../../src/libstdc++-v3/src/c++11/ext11-inst.cc
 100.0%  6.07Mi 100.0%   217Ki    TOTAL
Filtering enabled (source_filter); omitted file = 8.64Mi, vm =  601Ki of entries

Binary (NOT THE FINAL BINARY) from which the firmware is produced is reduced from: 1 052 416 (stripped) to 738 788 (stripped)

To reiterate: The final binary is reduced from 534 KiB (= 546172) to 221 KiB (= 226144 bytes), an improvement of 320028!

See https://github.com/AShedko/ONE/pull/new/luci_reduce_binary_size for required changes.

chunseoklee commented 2 years ago

See https://github.com/AShedko/ONE/pull/new/luci_reduce_binary_size for required changes.

https://github.com/AShedko/ONE/tree/luci_reduce_binary_size

lemmaa commented 2 years ago

@AShedko , I have a question.

This experiment doesn't seem to actually completely remove the C++ dependency. Right?

So, what is really important is the size of the stdcpp library linked to the image along with the size of the luci-micro binary. It's not just the size of luci-micro. Although the size of luci-micro has been greatly reduced with the changed lines in source code, it is questionable how much the size of the linked stdcpp library will actually be reduced, too.

~i.e. I think we need to compare (luci-micro + stdcpp) before vs. (luci-micro + stdcpp) after.~

To reiterate: The final binary is reduced from 534 KiB (= 546172) to 221 KiB (= 226144 bytes), an improvement of 320028!

(UPDATED) Oh, I read it wrong. It seems that this is already comparing it. Am I understanding correctly?

binarman commented 2 years ago

@lemmaa Andrey removed most of the C++ library from build.

If I remember it right, there some leftovers that costs several KBs. We can remove them too, but first we need to do refactoring.

lemmaa commented 2 years ago

@binarman , Thank you for the explaination. One thing I want to check is whether the final binary size is actually measured based on the result of compilation + link, or whether the size of the target parts in each compiled module without an actual link is analyzed and added up with a tool.

If it's the latter, it's because I'm concerned that the dependencies between the individual parts aren't fully resolved.

AShedko commented 2 years ago

The final 221 KiB binary is flashed to the device as a standalone, statically-linked firmware. See internal discussion for a rough breakdown of the memory consumption of different components.