Open wimalopaan opened 5 years ago
I compiled your examples on Linux with g++-8.3.0, here are the results:
faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ g++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ ls -l
загалом 48
drwxr-xr-x 11 faust faust 440 жов 30 13:49 sml
-rwxr-xr-x 1 faust faust 19360 жов 30 13:52 test
-rwxr-xr-x 1 faust faust 19328 жов 30 13:52 test2
-rw-r--r-- 1 faust faust 872 жов 30 13:51 test2.cpp
-rw-r--r-- 1 faust faust 1176 жов 30 13:50 test.cpp
After applying strip -s
both binaries have identical size.
CLang-9.0.0 is slightly different:
faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test2.cpp -std=c++17 -O2 -o test2
faust@hammer /tmp/sml $ clang++ -I sml/include/boost/ test.cpp -std=c++17 -O2 -o test
faust@hammer /tmp/sml $ ls -l
загалом 52
drwxr-xr-x 11 faust faust 440 жов 30 13:49 sml
-rwxr-xr-x 1 faust faust 23944 жов 30 13:54 test
-rwxr-xr-x 1 faust faust 19128 жов 30 13:53 test2
-rw-r--r-- 1 faust faust 872 жов 30 13:51 test2.cpp
-rw-r--r-- 1 faust faust 1176 жов 30 13:50 test.cpp
But after striping both binaries are around 14k.
I don't see much difference in size.
I found a way to reduce the size some more:
But still not full equivalent.
I compiled all three versions also for the AVR plattform to see the assembler differences:
text data bss dec hex filename 322 0 2 324 144 bm60.elf 254 0 2 256 100 bm61.elf 300 0 6 306 132 bm62.elf
bm60 is the first sml-example, bm61 is the switch-code, and bm62 is the second sml example. The question here, why are there now (bm62) 6 bytes in the bss section (1 byte for the volatile, 1 byte for the state of the fsm, but 4 bytes additional for ?).
Maybe alignment?
I suspect it has todo with thread-safety für the guard-functions.
If I don't need thread-safety: howto disable?
I don't see anything related to thread safety in disassembly.
This is from your modified version:
b0: 80 91 05 01 lds r24, 0x0105 ; 0x800105 <r>
This is from switch version:
90: 80 91 01 01 lds r24, 0x0101 ; 0x800101 <r>
What do you want to say with that?
Okay, I think I know where the 4 bytes come from. Transition table is build by inheriting each individual transition. When they are empty (have no fields) they lay within the same address and the whole table has zero size (though sizeof() will return 1). But when you inherit from non-empty classes the resulting class can't have zero size, because you must be able to address members of your non-empty classes. So they lay with 1 byte shift:
faust@hammer ~ $ cat test.cpp
#include <iostream>
struct Z {};
struct B1 {};
struct B2 {};
struct B3 {};
struct B4 {};
struct D : B1, B2, B3, B4 {};
int main()
{
std::cout << sizeof(D) << "\n";
return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
1
vs
faust@hammer ~ $ cat test.cpp
#include <iostream>
struct Z {};
struct B1 {Z z;};
struct B2 {Z z;};
struct B3 {Z z;};
struct B4 {Z z;};
struct D : B1, B2, B3, B4 {};
int main()
{
std::cout << sizeof(D) << "\n";
return 0;
}
faust@hammer ~ $ g++ test.cpp -o test
faust@hammer ~ $ ./test
4
Back to your example. In the original code, with 4 events and no guards all transitions are empty, they have no fields. In the example with 1 event and guards each transition has a data field - a zero_wrapper
around your guard. And the whole transition table is built by inheriting each of the transition, so it has size 4. Plus 1 byte of the state itself - 5 bytes for the machine. Plus your global r
- makes 6 bytes.
Ok, many thanks for that explanation!
Looks like one can't get the minimum code and data size as it is with the switch. In the following characteristics
bm60.elf : sml without guards, different events bm61.elf : switch bm62.elf : sml with guards, same event
$ avr-nm -CS -t d --size-sort bm60.elf
08388864 00000001 B r
08388865 00000001 b FSM::fsm
00000320 00000006 t _GLOBAL__sub_I_r
00000314 00000006 T main
00000326 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000070 W FSM::periodic()
$ avr-nm -CS -t d --size-sort bm61.elf
08388864 00000001 B r
08388865 00000001 b (anonymous namespace)::FSM::mState
00000192 00000016 T __do_clear_bss
00000220 00000052 T main
$ avr-nm -CS -t d --size-sort bm62.elf
08388864 00000001 B r
08388865 00000005 b (anonymous namespace)::FSM::fsm
00000302 00000006 t _GLOBAL__sub_I_r
00000308 00000012 T __tablejump2__
00000194 00000016 T __do_clear_bss
00000210 00000022 T __do_global_ctors
00000244 00000058 T main
That's a pity, because in my experience, properly designed abstractions do not yield increasing code or data size.
Well, I modified the very first example a bit:
#include <cstdint>
#include <sml.h>
namespace sml = boost::sml;
volatile uint8_t r = 0;
namespace {
struct start {};
struct run {};
struct off {};
struct error {};
using namespace sml;
struct machine {
inline auto operator()() const noexcept {
return sml::make_transition_table(
*"off"_s + event<start> = "start"_s,
"start"_s + event<run> = "running"_s,
"running"_s + event<off> = "off"_s,
"running"_s + event<error> = "error"_s
);
}
};
}
struct FSM {
static inline void periodic() {
toEvent(r);
}
static inline void toEvent(uint8_t value) {
if (value == 1) {
fsm.process_event(start{});
}
else if (value == 2) {
fsm.process_event(run{});
}
else if (value == 3) {
fsm.process_event(off{});
}
else if (value > 100) {
fsm.process_event(error{});
}
}
private:
inline static constinit sml::sm<machine> fsm{};
};
using fsm = FSM;
int main() {
while(true) {
fsm::periodic();
}
}
And now:
$ avr-nm -CS -t d --size-sort bm60.elf
08398848 00000001 B r
08398849 00000001 b FSM::fsm
00000280 00000006 t _GLOBAL__sub_I_r
00000286 00000012 T __tablejump2__
00000174 00000016 T __do_clear_bss
00000190 00000022 T __do_global_ctors
00000224 00000056 T main
Now, we have only 1 byte for the state and 56 bytes for the code. Looks ok now.
You can do even better if you remove static
:
faust@hammer /tmp/sml $ avr-nm -CS -t d --size-sort test.elf
08388864 00000001 B r
00000116 00000016 T __do_clear_bss
00000144 00000062 T main
Slightly bigger main
but no additional costs from global initialization. I got 800 bytes overall size versus 836 bytes with static
.
Besides:
faust@hammer /tmp/sml $ avr-size test2.elf test4.elf
text data bss dec hex filename
210 0 2 212 d4 test2.elf
210 0 1 211 d3 test4.elf
test2.elf - your switch version.
And if you get rid of static
in the switch version:
faust@hammer /tmp/sml $ avr-size test2.elf test4.elf
text data bss dec hex filename
194 0 1 195 c3 test2.elf
210 0 1 211 d3 test4.elf
I have this very simple example (see below) of a fsm. I wrote the same fsm with switch-stmts, and that version is much more smaller in code size. Any hints to reduce the size with sml?
The version with switch: