esp-rs / esp-flasher-stub

Rust implementation of flasher stub located in esptool
Apache License 2.0
17 stars 10 forks source link

Investigate how we can shrink the final binary size of the flasher stub #12

Closed jessebraham closed 11 months ago

jessebraham commented 1 year ago

We already take care of some low-hanging fruit in the Cargo manifest by settings various options for the release profile, however there is more we can do. #8 is an example of one additional step we can take.

min-sized-rust is a reasonably complete resource for this endeavour. panic_immediate_abort should shave off an additional few kB, for instance.

DNedic commented 1 year ago

esp-flasher-stub binary size analysis

Test setup

For the testing, the ESP32-C3 port was used with the rustc 1.70 compiler on the stable and 1.72 on the nightly branch.

cargo-bloat and manual parsing of the mapfile were used to obtain the data from the ELF. The final binary size is counted with the symbols stripped.

Initial tests

ESP32C3 OS NO LTO category size [KB]
total 47.5
text 28.7
core::fmt 8.4
hal 5.7
main 0.3
md5::compress 2.7
ESP32C3 OS LTO category size [KB]
total 44.5
text 25.8
core::fmt 7.9
hal 4.2
main 4.9
md5::compress 2.7

Sizes of over 40KB are very large for something that's supposed to be uploaded over UART. A big chunk of the size is the code, and there are some clear leaders in code size:

  1. core::fmt formatting machinery: unnecessary entirely as the project should either report errors in the form of responses to the host or should be debugged via JTAG.
  2. The esp-hal, dependency: we are only using serial and clock initialization for the stub, so this is a rather puzzling size
  3. main: this is normal, when building with LTO we expect things to be inlined
  4. md5::compress: Judging by the fact that we can achieve final stub binary sizes smaller than this for the C version of the stub, md5 shouldn't take up this much space either

Toolchain and feature based approaches

At the moment, the project is already using pretty optimal build options for reducing the binary size:

[profile.release]
strip = false
opt-level = "s" # temporary because of https://github.com/llvm/llvm-project/issues/57988
codegen-units = 1
lto = true
panic = "abort"

However, there are a few more things that can be attempted on the compiler side.

build-std is a cargo feature that lets us rebuild the core, std and other standard library parts using our compiler and linker flags. This in combination with LTO should provide more inlining and linker discards.

ESP32C3 BUILD-STD RELEASE LTO category size [KB]
total 41.2
text 25.1
core::fmt 6.6
hal 4.2
main 4.5
md5::compress 2.7

Unfortunately, it appears that the size decreases weren't large enough to make a dent in the big binary size.

Another thing that can be done is to enable the allow-opt-level-z feature for the hal and build the stub with Oz as an optimization level:

ESP32C3 -Oz LTO category size [KB]
total 46.3
text 26.1
core::fmt 8.1
hal 3.8
main 3.5
md5::compress 2.7

In this case, we get a regression.

Another thing to try is to use the ufmt feature of the esp-hal: ESP32C3 RELEASE LTO HAL FEATURE UFMT category size [KB]
total 44.5
text 25.8
core::fmt 7.9
hal 4.2
main 4.9
md5::compress 2.7

As there are no changes in the binary size, it appears that this feature is not yet implemented.

Removing core::fmt and .rodata strings

At the moment, we are using esp-backtrace for the panic handling, which uses println!, however there are panic implementations that do not do any string formatting, most popular of them being panic_abort.

It is important to note that building with panic_abort requires using the nightly channel as it uses intrinsics which haven't yet been stabilized.

cargo.toml:

panic-abort = "0.3.2"

main.rs:

extern crate panic_abort;
ESP32C3 Os LTO panic-abort category size [KB]
total 25.7
text 15.6
core::fmt 1.4
hal 3.6
main 4.3
md5::compress 2.7

We can see that this brings a huge reduction in overall size.

As mentioned above, another approach would be to use the build-std panic_immediate_abort feature, and this results in the following sizes:

ESP32C3 Os LTO panic-abort category size [KB]
total 30.0
text 19.3
core::fmt 4.3
hal 3.6
main 4.3
md5::compress 3.1

We can see that this still leaves a lot of core::fmt machinery.

RFC on future approaches

Some other approaches that can be tried according to the analysis:

  1. Removing the esp-hal dependency entirely and just using the pac, as setting clocks and serial is all we need at the moment and for the forseeable future
  2. Switching away from using the md5 dependency and using the same royalty free code as in the C version of the stub, then using the C FFI to link to it
  3. Making use of the ROM printing machinery (not sure if possible) if we really need it
jessebraham commented 1 year ago

31 reduced the size of the stub by ~5k

MabezDev commented 11 months ago

I think the only way to get the stub smaller at this point is to use even less Rust and use the ROM code for uart and usb-serial-jtag. Whilst this might be something we do in the future, I think using Rust code for now is fine. The stubs are currently 15KB~ in size, which in the grand scheme of flashing a whole program is very small.

Closing