LibreSolar / bms-firmware

Firmware for LibreSolar BMS boards based on bq769x0, bq769x2 or ISL94202
https://libre.solar/bms-firmware/
Apache License 2.0
154 stars 70 forks source link

Upgrade to Zephyr v3.7 LTS #78

Closed martinjaeger closed 1 month ago

martinjaeger commented 2 months ago

Tested with BMS C1.

With --sysbuild and MCUboot as the 2nd-stage bootloader, the boot takes several seconds and I don't have an explanation for this, yet. Opening as draft PR until the reason is found and fixed.

pasrom commented 1 month ago

I found two possible solutions, but I'm not sure if either is the best one.

Solution 1: Change Console to UART

The first issue I noticed is that bms-c1 uses zephyr,console = &usb_serial; (source), while Zephyr examples use zephyr,console = &uart0; (source).

When I build the Hello World! example with zephyr,console = &usb_serial;, I see the same behavior you reported. However, changing the overlay in the sysbuild (sysbuild/mcuboot/boards/bms_c1_0_4_0.overlay) to:

/ {
    chosen {
        zephyr,console = &uart0;
    };
};

Results in significantly better boot time.

Solution 2: Set MCUBOOT_LOG_LEVEL

The underlying problem is that in the Espressif HAL, the correct log level (MCUBOOT_LOG_LEVEL) isn't propagated from Zephyr (source):

#ifndef MCUBOOT_LOG_LEVEL
#define MCUBOOT_LOG_LEVEL MCUBOOT_LOG_LEVEL_INFO
#endif

MCUBOOT_LOG_LEVEL is never defined, as mentioned in the open issue (zephyrproject-rtos/zephyr#73679).

As a workaround, you can add the following line in the root CMakeLists.txt:

zephyr_compile_definitions(MCUBOOT_LOG_LEVEL=0)

However, the drawback of this approach is that the following information won't be printed:

[esp32c3] [INF] Image index: 0, Swap type: none
[esp32c3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32c3] [INF] Application start=40387342h
[esp32c3] [INF] DRAM segment: paddr=00021f10h, vaddr=3fc91ee0h, size=027b0h ( 10160) load
[esp32c3] [INF] IRAM segment: paddr=00010040h, vaddr=40380000h, size=11ed0h ( 73424) load
[esp32c3] [INF] DROM segment: paddr=00070000h, vaddr=3c040000h, size=0A5C0h ( 42432) map
[esp32c3] [INF] IROM segment: paddr=00030000h, vaddr=42000000h, size=321BCh (205244) map

Conclusion

Both solutions work, but each has its own drawback. The first solution (using zephyr,console = &uart0) improves boot time, while the second solution (setting MCUBOOT_LOG_LEVEL) suppresses important log output, which might be crucial for debugging.

I believe this behavior, in combination with the usb_serial configuration, is not specific to v3.7 LTS, as the issue seems to originate from a more general problem in the Espressif HAL.

martinjaeger commented 1 month ago

Thanks @pasrom for further investigating the issue.

When I build the Hello World! example with zephyr,console = &usb_serial;, I see the same behavior you reported. However, changing the overlay in the sysbuild (sysbuild/mcuboot/boards/bms_c1_0_4_0.overlay) to:

/ {
  chosen {
      zephyr,console = &uart0;
  };
};

Results in significantly better boot time.

What do you mean by significantly better? For me it really took around 5-10 seconds, whereas with previous Zephyr version the BMS was instantly on. If changing to a "real" UART solves the issue, something in the usb_serial driver may be broken. However, I can't spot any relevant change since v3.6 release in the driver history.

In the linked issue regarding MCUBOOT_LOG_LEVEL they talk about 100 ms of increase in boot time, which would be totally acceptable for me. Just don't want to wait several seconds ;)

pasrom commented 1 month ago

What do you mean by significantly better? For me it really took around 5-10 seconds, whereas with previous Zephyr version the BMS was instantly on.

The boot time increased slightly (around 100 ms) after updating to Zephyr v3.7 due to additional log content being printed, which wasn't present in Zephyr v3.6.:

kernel reboot cold
ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0xc (RTC_SW_CPU_RST),boot:0xd (SPI_FAST_FLASH_BOOT)
Saved PC:0x4038149e
SPIWP:0xee
mode:DIO, clock div:2
load:0x3fcb6a00,len:0x1ff8
load:0x403c0200,len:0x87fc
load:0x403c9a00,len:0x120c
entry 0x403c2646
I (43) boot: MCUboot 2nd stage bootloader
I (43) boot: compile time Sep 26 2024 07:54:53
I (43) boot: Multicore bootloader
I (44) spi_flash: detected chip: generic
I (46) spi_flash: flash io: dio
W (49) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (61) boot: chip revision: v0.4
I (64) boot.esp32c3: SPI Speed      : 40MHz
I (68) boot.esp32c3: SPI Mode       : DIO
I (72) boot.esp32c3: SPI Flash Size : 4MB
I (76) boot: Enabling RNG early entropy source...
I (115) spi_flash: flash io: dio
[esp32c3] [INF] Image index: 0, Swap type: none
[esp32c3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32c3] [INF] Application start=40387342h
[esp32c3] [INF] DRAM segment: paddr=00021f10h, vaddr=3fc91ee0h, size=02774h ( 10100) load
[esp32c3] [INF] IRAM segment: paddr=00010040h, vaddr=40380000h, size=11ed0h ( 73424) load
[esp32c3] [INF] DROM segment: paddr=00070000h, vaddr=3c040000h, size=0A580h ( 42368) map
[esp32c3] [INF] IROM segment: paddr=00030000h, vaddr=42000000h, size=32146h (205126) map

I (23518) spi_flash: detected chip: generic
I (23519) spi_flash: flash io: dio
*** Booting Zephyr OS build v3.7.0-107-g86e62dd48209 ***
I: ThingSet Node ID (EUI-64): CCCB1AFC2B14F394
I: BT controller compile version [963cad4]
I: Identity: 34:85:18:E8:FF:54 (public)
I: HCI: version 5.0 (0x09) revision 0x0016, manufacturer 0x02e5
I: LMP: version 5.0 (0x09) subver 0x0016
I: Waiting for Bluetooth connections...
I: Hardware: Libre Solar BMS C1 (v0.4)
I: Firmware: v23.1-74-g9cb82aa

uart:~$ I: Detected 4 cells
I: Activated BMS IC 0x7695
I: OFF -> DIS (error flags: 0x00000000)
I: DIS -> NORMAL (error flags: 0x00000000)

Zephyr v3.6 logs are more minimal:

kernel reboot cold
ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0x3 (RTC_SW_SYS_RST),boot:0xd (SPI_FAST_FLASH_BOOT)
Saved PC:0x403803b2
SPIWP:0xee
mode:DIO, clock div:2
load:0x3fcd8000,len:0x1030
load:0x403cada8,len:0x3e40
load:0x403d0000,len:0x1a20
entry 0x403cc734

uart:~$ *** Booting Zephyr OS build v3.6.0-114-g2c2540ee4638 ***
I: ThingSet Node ID (EUI-64): CCCB1AFC2B14F394
I: BT controller compile version [d913766]
I: Identity: E5:E1:2F:46:85:05 (random)
I: HCI: version 5.0 (0x09) revision 0x0016, manufacturer 0x0060
I: LMP: version 5.0 (0x09) subver 0x0016
I: Waiting for Bluetooth connections...
I: Hardware: Libre Solar BMS C1 (v0.4)
I: Firmware: v23.1-71-g0b436bc
I: Detected 4 cells
I: Activated BMS IC 0x7695
I: OFF -> DIS (error flags: 0x00000000)
I: DIS -> NORMAL (error flags: 0x00000000)

To match Zephyr v3.6 output with usb_serial, apply the following:

  1. https://github.com/e-battery-systems/bms-firmware/commit/ab35ea8449b6c2db1977c5710a732c56973a4f7e
  2. Cherry-pick https://github.com/e-battery-systems/zephyr/commit/f86314ecb6a6f00b2120420377e66f37dac374cb to Zephyr (log level adjustment).

Alternatively, you can switch to uart0: https://github.com/e-battery-systems/bms-firmware/commit/e7299fa6fa4d579cf79717ce14b6ea77e669651e

Both actions should result in instant boot times, similar to Zephyr v3.6.

The delay in v3.7 appears to be caused by the updated Espressif HAL implementation, as ets_printf seems slower in v3.7.

sylvioalves commented 1 month ago

You can get around 80ms just adding CONFIG_MCUBOOT_LOG_LEVEL_OFF=y in the mcuboot app. Is that enough to meet the project needs? As mentioned somewhere, changing sdkconfig.h in hal_espressif from #define CONFIG_BOOTLOADER_LOG_LEVEL 3 to #define CONFIG_BOOTLOADER_LOG_LEVEL 0 for both application and mcuboot, will decrease even more. We will bring this to Zephyr environment, however I don't think it will land on v3.7.1.

pasrom commented 1 month ago

Hi @sylvioalves,

thanks for your response. The CONFIG_MCUBOOT_LOG_LEVEL_OFF=y is already configured here and works as expected with Zephyr v3.6. Your suggestion about changing CONFIG_BOOTLOADER_LOG_LEVEL is appreciated, but from what I’ve observed, it doesn’t seem to suppress the boot messages, as they seem to depend on MCUBOOT_LOG_LEVEL and LOG_LOCAL_LEVEL (for mcuboot & espressif hal).

So the difference from v3.6 to v3.7 is the following boot output:

I (43) boot: MCUboot 2nd stage bootloader
I (43) boot: compile time Sep 26 2024 07:54:53
I (43) boot: Multicore bootloader
I (44) spi_flash: detected chip: generic
I (46) spi_flash: flash io: dio
W (49) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (61) boot: chip revision: v0.4
I (64) boot.esp32c3: SPI Speed      : 40MHz
I (68) boot.esp32c3: SPI Mode       : DIO
I (72) boot.esp32c3: SPI Flash Size : 4MB
I (76) boot: Enabling RNG early entropy source...
I (115) spi_flash: flash io: dio
[esp32c3] [INF] Image index: 0, Swap type: none
[esp32c3] [INF] Loading image 0 - slot 0 from flash, area id: 1
[esp32c3] [INF] Application start=40387342h
[esp32c3] [INF] DRAM segment: paddr=00021f10h, vaddr=3fc91ee0h, size=02774h ( 10100) load
[esp32c3] [INF] IRAM segment: paddr=00010040h, vaddr=40380000h, size=11ed0h ( 73424) load
[esp32c3] [INF] DROM segment: paddr=00070000h, vaddr=3c040000h, size=0A580h ( 42368) map
[esp32c3] [INF] IROM segment: paddr=00030000h, vaddr=42000000h, size=32146h (205126) map

I (23518) spi_flash: detected chip: generic
I (23519) spi_flash: flash io: dio

The print output above uses either ESP_EARLY_LOGI, BOOT_LOG_INF, or a custom define. From what I observe, CONFIG_BOOTLOADER_LOG_LEVEL does not affect the print behavior, as it depends on MCUBOOT_LOG_LEVEL or LOG_LOCAL_LEVEL.

Do you have any insight into why the same printout shows different timing behavior when using uart0 instead of usb_serial (approximately 100 ms vs several seconds, as mentioned)?

martinjaeger commented 1 month ago

@pasrom I went with your suggestion to select the UART for console output in MCUboot. Thanks again for finding this solution/workaround.

Also needed to fix some MCUboot configuration to allow building the BMS C1 firmware with and without sysbuild. The PR should be ready to go now.