ETLCPP / etl

Embedded Template Library
https://www.etlcpp.com
MIT License
2.22k stars 392 forks source link

Large compiled binary size due to inline expansion in ETL Library #717

Closed Tagussan closed 2 months ago

Tagussan commented 1 year ago

Hi,

I'm using the ETL library in my project and I've found that the compiled binary size is huge. Upon disassembling the object file, I noticed that functions from the ETL library are inline expanded everywhere. Specifically, operations involving etl::string and etl::vector are inline expanded and consume a lot of opcodes.

Is there any way to reduce or disable this level of inline expansion? This is becoming a problem in my project, as it takes up a substantial amount of Flash on my MCU.

jwellbelove commented 1 year ago

What compiler are you using? GCC has a -fno-inline compile option. There is also a -Os option (optimize for size) that may help.

Can you give me an example of a function that is being inlined?

Tagussan commented 1 year ago

I'm using g++. I haven't tried that flags yet.

Here's an example. In the following assembly, we can see checking ASSERT, checking CAPACITY, some pointer operation, etc... and none of these are explicitly written in the original code.

ETL_EXPLICIT_STRING_FROM_CHAR string(const value_type* text)
      : istring(reinterpret_cast<value_type*>(&buffer), MAX_SIZE)
    {
      this->assign(text, text + etl::char_traits<value_type>::length(text));
      40:   4ab1        ldr r2, [pc, #708]  ; (308 <SomeName::SomeName::SomeClass::setup()+0x308>)
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
      42:   2900        cmp r1, #0
      44:   eb01 0502   add.w   r5, r1, r2
      48:   f2c1 80f4   blt.w   1234 <SomeName::SomeName::SomeClass::setup()+0x1234>
      while ((first != last) && (current_size != CAPACITY))
      4c:   4295        cmp r5, r2
      p_buffer[0] = 0;
      4e:   f88d 30b4   strb.w  r3, [sp, #180]  ; 0xb4
      while ((first != last) && (current_size != CAPACITY))
      52:   f001 80ed   beq.w   1230 <SomeName::SomeName::SomeClass::setup()+0x1230>
      56:   3d01        subs    r5, #1
      58:   4601        mov r1, r0
      5a:   24e6        movs    r4, #230    ; 0xe6
      5c:   e002        b.n 64 <SomeName::SomeName::SomeClass::setup()+0x64>
        p_buffer[current_size++] = *first++;
      5e:   f812 4f01   ldrb.w  r4, [r2, #1]!
      62:   992c        ldr r1, [sp, #176]  ; 0xb0
      64:   1c58        adds    r0, r3, #1
      while ((first != last) && (current_size != CAPACITY))
      66:   4295        cmp r5, r2
        p_buffer[current_size++] = *first++;
      68:   9029        str r0, [sp, #164]  ; 0xa4
      6a:   54cc        strb    r4, [r1, r3]
      while ((first != last) && (current_size != CAPACITY))
      6c:   f000 87a4   beq.w   fb8 <SomeName::SomeName::SomeClass::setup()+0xfb8>
      70:   e9dd 3129   ldrd    r3, r1, [sp, #164]  ; 0xa4
      74:   428b        cmp r3, r1
      76:   d1f2        bne.n   5e <SomeName::SomeName::SomeClass::setup()+0x5e>
      p_buffer[current_size] = 0;
      78:   9a2c        ldr r2, [sp, #176]  ; 0xb0
      7a:   2100        movs    r1, #0
      7c:   54d1        strb    r1, [r2, r3]
      value ? data |= (pattern & MASK) : data &= (~pattern & MASK);
      7e:   f89d 30ac   ldrb.w  r3, [sp, #172]  ; 0xac
      82:   f043 0301   orr.w   r3, r3, #1
      86:   f88d 30ac   strb.w  r3, [sp, #172]  ; 0xac
        this->x = _x;
      8a:   23a0        movs    r3, #160    ; 0xa0
      return &p_buffer[current_size];
      8c:   9e29        ldr r6, [sp, #164]  ; 0xa4
      8e:   f8aa 30a2   strh.w  r3, [sl, #162]  ; 0xa2
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
      92:   2e00        cmp r6, #0
        this->y = _y;
      94:   f04f 0305   mov.w   r3, #5
      98:   f8aa 30a4   strh.w  r3, [sl, #164]  ; 0xa4
      9c:   f2c1 80ca   blt.w   1234 <SomeName::SomeName::SomeClass::setup()+0x1234>
      if (is_secure())
      a0:   f89a 30bc   ldrb.w  r3, [sl, #188]  ; 0xbc
      current_size = 0U;
      a4:   2100        movs    r1, #0
      return &p_buffer[0];
      a6:   9c2c        ldr r4, [sp, #176]  ; 0xb0
      if (is_secure())
      a8:   0798        lsls    r0, r3, #30
      current_size = 0U;
      aa:   f8ca 10b4   str.w   r1, [sl, #180]  ; 0xb4
      if (is_secure())
      ae:   d509        bpl.n   c4 <SomeName::SomeName::SomeClass::setup()+0xc4>
        etl::memory_clear_range(&p_buffer[current_size], &p_buffer[CAPACITY]);
      b0:   f8da 00b8   ldr.w   r0, [sl, #184]  ; 0xb8
      b4:   f8da 30c0   ldr.w   r3, [sl, #192]  ; 0xc0
      b8:   b130        cbz r0, c8 <SomeName::SomeName::SomeClass::setup()+0xc8>
      ba:   4418        add r0, r3
      *p++ = 0;
      bc:   7019        strb    r1, [r3, #0]
      be:   3301        adds    r3, #1
    while (n--)
      c0:   4298        cmp r0, r3
      c2:   d1fb        bne.n   bc <SomeName::SomeName::SomeClass::setup()+0xbc>
      p_buffer[0] = 0;
      c4:   f8da 30c0   ldr.w   r3, [sl, #192]  ; 0xc0
      c8:   2200        movs    r2, #0
      return &p_buffer[current_size];
      ca:   4426        add r6, r4
      p_buffer[0] = 0;
      cc:   701a        strb    r2, [r3, #0]
      while ((first != last) && (current_size != CAPACITY))
      ce:   42b4        cmp r4, r6
      d0:   f89a 30bc   ldrb.w  r3, [sl, #188]  ; 0xbc
      d4:   f023 0301   bic.w   r3, r3, #1
      d8:   f88a 30bc   strb.w  r3, [sl, #188]  ; 0xbc
      dc:   f001 801b   beq.w   1116 <SomeName::SomeName::SomeClass::setup()+0x1116>
      e0:   4622        mov r2, r4
      e2:   e00a        b.n fa <SomeName::SomeName::SomeClass::setup()+0xfa>
        p_buffer[current_size++] = *first++;
      e4:   f812 0b01   ldrb.w  r0, [r2], #1
      e8:   1c5d        adds    r5, r3, #1
      ea:   f8da 10c0   ldr.w   r1, [sl, #192]  ; 0xc0
      while ((first != last) && (current_size != CAPACITY))
      ee:   4296        cmp r6, r2
        p_buffer[current_size++] = *first++;
      f0:   f8ca 50b4   str.w   r5, [sl, #180]  ; 0xb4
      f4:   54c8        strb    r0, [r1, r3]
      while ((first != last) && (current_size != CAPACITY))
      f6:   f001 800e   beq.w   1116 <SomeName::SomeName::SomeClass::setup()+0x1116>
      fa:   e9da 312d   ldrd    r3, r1, [sl, #180]  ; 0xb4
      fe:   428b        cmp r3, r1
     100:   d1f0        bne.n   e4 <SomeName::SomeName::SomeClass::setup()+0xe4>
      p_buffer[current_size] = 0;
     102:   f8da 20c0   ldr.w   r2, [sl, #192]  ; 0xc0
     106:   2100        movs    r1, #0
     108:   54d1        strb    r1, [r2, r3]
     10a:   f89a 30bc   ldrb.w  r3, [sl, #188]  ; 0xbc
     10e:   f043 0301   orr.w   r3, r3, #1
     112:   f88a 30bc   strb.w  r3, [sl, #188]  ; 0xbc
      return (data & pattern) != value_type(0);
     116:   f89d 50ac   ldrb.w  r5, [sp, #172]  ; 0xac
      if (other.is_truncated())
     11a:   07ea        lsls    r2, r5, #31
     11c:   d503        bpl.n   126 <SomeName::SomeName::SomeClass::setup()+0x126>
      value ? data |= (pattern & MASK) : data &= (~pattern & MASK);
     11e:   f043 0301   orr.w   r3, r3, #1
     122:   f88a 30bc   strb.w  r3, [sl, #188]  ; 0xbc
      if (other.is_secure())
     126:   f015 0502   ands.w  r5, r5, #2
     12a:   f041 8010   bne.w   114e <SomeName::SomeName::SomeClass::setup()+0x114e>
      if (is_secure())
     12e:   079f        lsls    r7, r3, #30
     130:   f101 8011   bmi.w   1156 <SomeName::SomeName::SomeClass::setup()+0x1156>
jwellbelove commented 1 year ago

ETL_EXPLICIT_STRING_FROM_CHAR string(const value_type* text) calls void assign(TIterator first, TIterator last) which ETL_ASSERTS that the iterated distance is >=0 (if ETL_IS_DEBUG_BUILD is true), initialises the string (clearing the buffer first if ETL_HAS_STRING_CLEAR_AFTER_USE is true and the secure flag is set), and then fills the buffer with the characters, whilst ensuring that the CAPACITY is not exceeded.

    template <typename TIterator>
    void assign(TIterator first, TIterator last)
    {
#if ETL_IS_DEBUG_BUILD
      difference_type d = etl::distance(first, last);
      ETL_ASSERT(d >= 0, ETL_ERROR(string_iterator));
#endif

      initialise();

      while ((first != last) && (current_size != CAPACITY))
      {
        p_buffer[current_size++] = *first++;
      }

      p_buffer[current_size] = 0;

#if ETL_HAS_STRING_TRUNCATION_CHECKS
      set_truncated(first != last);

#if ETL_HAS_ERROR_ON_STRING_TRUNCATION
      ETL_ASSERT(flags.test<IS_TRUNCATED>() == false, ETL_ERROR(string_truncation))
#endif
#endif
    }

Your compiler appears to be inlining the initialise() function, specifically etl::memory_clear, which just contains a simple while loop.

    while (n--)
    {
      *p++ = 0;
    }
jwellbelove commented 1 year ago

Or are you saying that the compiler inlines the string(const value_type* text) constructor at every place it's called?

Tagussan commented 1 year ago

the compiler inlines the string(const value_type* text) constructor at every place it's called?

Yes, comparing the original code, that sounds true. I'm using -O2 flag for compilation

quickshat commented 1 year ago

Hi, i am also experiencing this issue. The linked ETL lib consumes somewhat around 100kB (which is nearly 80% of ROM) and i am only using some vector, map, delegate and variant (with 2 types).

2 years ago i also used ETL without any explicit linkage - rather by including - and i don't remember it being that consuming.

Also no-inline makes things even worse by 5kB. I am also compiling with -Os. When leaving out any optimisation, the build exceeds ROM by 13%.

jwellbelove commented 1 year ago

What error handling configuration are you using? Exceptions can add a ton of code. Also RTTI.

quickshat commented 1 year ago

I am using ETL_NO_CHECKS. And as far as i know, ETL only bloats due to RTTI with exceptions being enabled right ?

jwellbelove commented 1 year ago

In my experience RTTI and exceptions are independent of each other. I looked at using exceptions in a project once and the code size increased dramatically.

jwellbelove commented 1 year ago

Do you have some sample code that I can try with the embedded compilers I have installed on my machine?

quickshat commented 1 year ago

Do you have some sample code that I can try with the embedded compilers I have installed on my machine?

I've sent you my current state project - which builds successfully on my machine - as a download link via Contact form on the ETL website.

jwellbelove commented 1 year ago

Thanks. I'll take a look at the weekend as I'm away in London until Friday evening.

jwellbelove commented 1 year ago

I've brought the project into STM32CubeIDE, but the build fails due to not finding the #include "stm32f1xx_hal.h" for the drivers.

quickshat commented 1 year ago

You have to download the F1 Firmware package first with CUBEMX. I guess you never used a F1 before?

jwellbelove commented 1 year ago

I don't normally use STM32CudeIDE. I just have it installed for ETL cross platform compatibility and bug testing.

quickshat commented 1 year ago

I also dont use Cube IDE. I am using Cmake with CUBEMX standalone.

quickshat commented 1 year ago

Do you have any updates or should i try to bundle the STM32 F1 SDK ?

jwellbelove commented 1 year ago

I'm still looking at this when I can. I've been very busy with other work recently, but I'll do what I can.

jwellbelove commented 1 year ago

I've tried making a simple project in Keil, cutting out all of the hardware related calls to see what map file creates. The code sizes for the ivector member functions seems to be reasonably small.

    Exec Addr    Load Addr    Size         Type   Attr      Idx    E Section Name        Object
    0x08000fa4   0x08000fa4   0x0000001c   Code   RO         90    .text._ZN3etl7ivectorItE10initialiseEv  scase.o
    0x08000fc0   0x08000fc0   0x0000002a   Code   RO        142    .text._ZN3etl7ivectorItE11create_backEOt  scase.o
    0x08000fea   0x08000fea   0x00000002   PAD
    0x08000fec   0x08000fec   0x00000010   Code   RO         96    .text._ZN3etl7ivectorItE5clearEv  scase.o
    0x08000ffc   0x08000ffc   0x0000001e   Code   RO         38    .text._ZN3etl7ivectorItE9push_backEOt  scase.o
    0x0800101a   0x0800101a   0x00000002   PAD
    0x0800101c   0x0800101c   0x00000022   Code   RO         88    .text._ZN3etl7ivectorItEC2EPtj  scase.o
    0x0800103e   0x0800103e   0x00000002   PAD
    0x08001040   0x08001040   0x00000014   Code   RO         98    .text._ZN3etl7ivectorItED2Ev  scase.o

Which totals to 176 bytes.

quickshat commented 1 year ago

Ok that's a point. Actually you can track my last map file aswell. Should be in the cmake-debug-build folder i sent you. But i'll also have a closer look on that. Last time i checked with a visualizer tool due to the map files large size and i wasn't able to track down the issue.