boost-ext / sml

C++14 State Machine library
https://boost-ext.github.io/sml
Boost Software License 1.0
1.16k stars 179 forks source link

Size of generated code #308

Open wimalopaan opened 5 years ago

wimalopaan commented 5 years ago

I have this very simple example (see below) of a fsm. I wrote the same fsm with switch-stmts, and that version is much more smaller in code size. Any hints to reduce the size with sml?


#include <sml.h>

namespace sml = boost::sml;

namespace {
    struct start {};
    struct run {};
    struct off {};
    struct error {};

    using namespace sml;

    struct machine {
        inline auto operator()() const noexcept {
            return sml::make_transition_table(
                        *"off"_s + event<start> = "start"_s,
                        "start"_s + event<run>  = "running"_s,
                        "running"_s + event<off> = "off"_s,
                        "running"_s + event<error> = "error"_s
                                                     );
        }
    };
} 

volatile uint8_t r;

struct FSM {
    static inline void periodic() {
        uint8_t rr = r;
        if (rr == 1) {
            fsm.process_event(start{});
        }
        else if (rr == 2) {
            fsm.process_event(run{});
        }
        else if (rr == 3) {
            fsm.process_event(off{});
        }
        else if (rr > 100) {
            fsm.process_event(error{});
        }
    }
private:
    inline static sml::sm<machine> fsm;
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}

The version with switch:


volatile uint8_t r;

struct FSM {
    enum class State : uint8_t {Off, Start, Running, Error};

    inline static void periodic() {
        uint8_t rr = r;
        switch(mState) {
        case State::Off:
            if (rr == 1) {
                mState = State::Start;
            }
            break;
        case State::Start:
            if (rr == 2) {
                mState = State::Running;
            }
            break;
        case State::Running:
            if (rr == 3) {
                mState = State::Off;
            }
            else if (rr > 100) {
                mState = State::Error;
            }
            break;
        case State::Error:
            break;
        }
    }
private:
    inline static State mState = State::Off;
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}
madf commented 5 years ago

I compiled your examples on Linux with g++-8.3.0, here are the results:

faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ ls -l
загалом 48
drwxr-xr-x 11 faust faust   440 жов 30 13:49 sml
-rwxr-xr-x  1 faust faust 19360 жов 30 13:52 test
-rwxr-xr-x  1 faust faust 19328 жов 30 13:52 test2
-rw-r--r--  1 faust faust   872 жов 30 13:51 test2.cpp
-rw-r--r--  1 faust faust  1176 жов 30 13:50 test.cpp

After applying strip -s both binaries have identical size.

CLang-9.0.0 is slightly different:

faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ ls -l
загалом 52
drwxr-xr-x 11 faust faust   440 жов 30 13:49 sml
-rwxr-xr-x  1 faust faust 23944 жов 30 13:54 test
-rwxr-xr-x  1 faust faust 19128 жов 30 13:53 test2
-rw-r--r--  1 faust faust   872 жов 30 13:51 test2.cpp
-rw-r--r--  1 faust faust  1176 жов 30 13:50 test.cpp

But after striping both binaries are around 14k.

I don't see much difference in size.

wimalopaan commented 5 years ago

I found a way to reduce the size some more:

https://godbolt.org/z/danKOA

But still not full equivalent.

wimalopaan commented 5 years ago

I compiled all three versions also for the AVR plattform to see the assembler differences:

text data bss dec hex filename 322 0 2 324 144 bm60.elf 254 0 2 256 100 bm61.elf 300 0 6 306 132 bm62.elf

bm60 is the first sml-example, bm61 is the switch-code, and bm62 is the second sml example. The question here, why are there now (bm62) 6 bytes in the bss section (1 byte for the volatile, 1 byte for the state of the fsm, but 4 bytes additional for ?).

madf commented 5 years ago

Maybe alignment?

wimalopaan commented 5 years ago

I suspect it has todo with thread-safety für the guard-functions.

If I don't need thread-safety: howto disable?

madf commented 5 years ago

I don't see anything related to thread safety in disassembly.

madf commented 5 years ago

This is from your modified version:

    b0:   80 91 05 01     lds     r24, 0x0105      ; 0x800105 <r>

This is from switch version:

    90:   80 91 01 01     lds     r24, 0x0101      ; 0x800101 <r>
wimalopaan commented 5 years ago

What do you want to say with that?

madf commented 5 years ago

Okay, I think I know where the 4 bytes come from. Transition table is build by inheriting each individual transition. When they are empty (have no fields) they lay within the same address and the whole table has zero size (though sizeof() will return 1). But when you inherit from non-empty classes the resulting class can't have zero size, because you must be able to address members of your non-empty classes. So they lay with 1 byte shift:

faust@hammer ~ $ cat test.cpp
#include <iostream>

struct Z {};
struct B1 {};
struct B2 {};
struct B3 {};
struct B4 {};
struct D : B1, B2, B3, B4 {};

int main()
{
    std::cout << sizeof(D) << "\n";
    return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
1

vs

faust@hammer ~ $ cat test.cpp
#include <iostream>

struct Z {};
struct B1 {Z z;};
struct B2 {Z z;};
struct B3 {Z z;};
struct B4 {Z z;};
struct D : B1, B2, B3, B4 {};

int main()
{
    std::cout << sizeof(D) << "\n";
    return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
4

Back to your example. In the original code, with 4 events and no guards all transitions are empty, they have no fields. In the example with 1 event and guards each transition has a data field - a zero_wrapper around your guard. And the whole transition table is built by inheriting each of the transition, so it has size 4. Plus 1 byte of the state itself - 5 bytes for the machine. Plus your global r - makes 6 bytes.

wimalopaan commented 5 years ago

Ok, many thanks for that explanation!

Looks like one can't get the minimum code and data size as it is with the switch. In the following characteristics

bm60.elf : sml without guards, different events bm61.elf : switch bm62.elf : sml with guards, same event

$ avr-nm -CS -t d --size-sort bm60.elf
08388864 00000001 B r
08388865 00000001 b FSM::fsm
00000320 00000006 t _GLOBAL__sub_I_r
00000314 00000006 T main
00000326 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000070 W FSM::periodic()
$ avr-nm -CS -t d --size-sort bm61.elf
08388864 00000001 B r
08388865 00000001 b (anonymous namespace)::FSM::mState
00000192 00000016 T __do_clear_bss
00000220 00000052 T main
$ avr-nm -CS -t d --size-sort bm62.elf
08388864 00000001 B r
08388865 00000005 b (anonymous namespace)::FSM::fsm
00000302 00000006 t _GLOBAL__sub_I_r
00000308 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000058 T main

That's a pity, because in my experience, properly designed abstractions do not yield increasing code or data size.

wimalopaan commented 5 years ago

Well, I modified the very first example a bit:

#include <cstdint>
#include <sml.h>

namespace sml = boost::sml;

volatile uint8_t r = 0;

namespace {
    struct start {};
    struct run {};
    struct off {};
    struct error {};

    using namespace sml;

    struct machine {
        inline auto operator()() const noexcept {
            return sml::make_transition_table(
                        *"off"_s + event<start> = "start"_s,
                        "start"_s + event<run>  = "running"_s,
                        "running"_s + event<off> = "off"_s,
                        "running"_s + event<error> = "error"_s
                                                     );
        }
    };
} 

struct FSM {
    static inline void periodic() {
        toEvent(r);
    }
    static inline void toEvent(uint8_t value) {
        if (value == 1) {
            fsm.process_event(start{});
        }
        else if (value == 2) {
            fsm.process_event(run{});
        }
        else if (value == 3) {
            fsm.process_event(off{});
        }
        else if (value > 100) {
            fsm.process_event(error{});
        }
    }
private:
    inline static constinit sml::sm<machine> fsm{};
};

using fsm = FSM;

int main() {
    while(true) {
        fsm::periodic();
    }
}

And now:

$ avr-nm -CS -t d --size-sort bm60.elf
08398848 00000001 B r
08398849 00000001 b FSM::fsm
00000280 00000006 t _GLOBAL__sub_I_r
00000286 00000012 T __tablejump2__
00000174 00000016 T __do_clear_bss
00000190 00000022 T __do_global_ctors
00000224 00000056 T main

Now, we have only 1 byte for the state and 56 bytes for the code. Looks ok now.

madf commented 5 years ago

You can do even better if you remove static:

faust@hammer /tmp/sml $ avr-nm -CS -t d --size-sort test.elf 
08388864 00000001 B r
00000116 00000016 T __do_clear_bss
00000144 00000062 T main

Slightly bigger main but no additional costs from global initialization. I got 800 bytes overall size versus 836 bytes with static.

madf commented 5 years ago

Besides:

faust@hammer /tmp/sml $ avr-size test2.elf test4.elf 
   text    data     bss     dec     hex filename
    210       0       2     212      d4 test2.elf
    210       0       1     211      d3 test4.elf

test2.elf - your switch version. And if you get rid of static in the switch version:

faust@hammer /tmp/sml $ avr-size test2.elf test4.elf 
   text    data     bss     dec     hex filename
    194       0       1     195      c3 test2.elf
    210       0       1     211      d3 test4.elf