LibreSolar / charge-controller-firmware

Firmware for Libre Solar MPPT/PWM charge controllers
https://libre.solar/charge-controller-firmware/
Apache License 2.0
144 stars 71 forks source link

MPPT 2420 LC Improvements Required #125

Open hogthrob opened 2 years ago

hogthrob commented 2 years ago

TL;DR It is impossible without workarounds to enabled OLED on the MPPT 2420 LC due to insufficient RAM. Question is if and how to integrate these workarounds into the official setup

When updating my MPPT 2420 LC to the new v21 line based on Zephyr, I noticed that it was impossible to compile my (previously working) configuration from the mbed times. I.e. when selecting CAN and Serial plus OLED, it would not fit the content in the RAM. Reason is the meager 16k of RAM of the STM32F072 in combination with the additional requirements of multi threaded operation (mostly stack sizes). Even my actually required configuration of OLED and Serial Thingset doesn't fit. With some adjustments, a bit of code analysis and some trial and error, it turns out that it is safe to reduce the ISR stack allocation to 1KB, and to reduce the TX buffer by half to 512. Since I don't need CAN anymore (switched to ESP32 comm) I wanted to get rid of CAN as well. Just turning of the thingset CAN part unfortunately does not turn off the Zephyr CAN driver which eats valuable RAM without any benefit.

It is now easy to solve this by adding the appropriate settings to prj.conf:

# zephyr os settings
CONFIG_CAN=n
CONFIG_ISR_STACK_SIZE=1024
# "application" settings
CONFIG_THINGSET_CAN=n
CONFIG_UEXT_OLED_DISPLAY=y
CONFIG_UEXT_OLED_BRIGHTNESS=1
CONFIG_THINGSET_SERIAL_TX_BUF_SIZE=512

Question now is, how to make this easier for others to use this knowledge. We can't just put this in prj.conf (maybe we could with the commented out, was it is already done for other settings. Most of this applies only to MPPT 2420 with its low RAM, but on the other hand, turning of the CAN driver if not being by ThingSet used makes sense for all. And reducing the default ISR stack size should be done only if necessary as it may unexpected problems e.g. if additional drivers would require more ISR stack.

My knowledge of the Zephyr eco system is not good enough to see how to accomplish this, but I guess it is possible. This would allow users of the "old" MPPT 2420 like me to upgrade more easily.

martinjaeger commented 2 years ago

Good point. Memory optimization is a long-standing issue on my ToDo list :)

If you didn't see yet, Zephyr provides an easy way to see the memory consumption in the different areas of the firmware, just run west build -t ram_report or rom_report.

Probably the best way to gain large amounts of RAM is to use only two (mutex-protected) buffers for all ThingSet operations. Currently we have 5 different buffers of something between 512 and 1024 bytes each: RX and TX for the CAN, RX and TX for the serial and one buffer for data storage.

As a short term solution I think reducing the ISR stack size as you suggested is definitely fine. For some of the other stacks I'm a bit more afraid and we'd probably have to run a careful analysis with different firmware options. Zephyr's Thread Analyzer can help with this.

We could apply some settings only for the MPPT 2420 LC in its _defconfig file (as done for disabling the CAN already, see boards folder). Maybe also a reduced set of ThingSet data objects could be a solution to reduce footprint.

hogthrob commented 2 years ago

Indeed, as I am new to the Zephyr eco system, I did not knew the ram&rom report. I am used to read the GCC linker map files (which I did when there was still platformio support). However, it is a relatively few places where static memory allocation in large chunks happens on application level (thread stacks and the tx/rx buffers plus the OLED display memory). The rest of the static RAM usage is AFAIR not worth spending too much time on. BTW, I tried to reduce stack sizes on the application level, but there are no easy gains here AFAIS, as any significant reduction made the firmware crash. I found this particular interesting for the OLED thread but did not see where memory usage comes from and stopped looking after I found the other changes to do the job for me.

Reducing memory footprint by combining buffers now used for different purpose in different threads can help but is a challenge on its own because using mutexes to gain exclusive access to shared resources may lead to serious priority inversion problems. Imaging the serial code using the TX buffer and the CAN thread wants to sent something. It has to wait until the serial code has sent all the bits. E.g. if the message is 500 bytes at 115200 bits per second, this takes about 40ms. However reducing the size of the tx buffers is something to be considered and quite easy. The reduction of the serial TX buffer to 512 bytes causes some responses to not fit the buffer (e.g. requesting the complete configuration in one go with ?conf is not working, but getting the name of all configuration variables with ?conf/ does work. This is for me a fair compromise, as long as I can individually query each variable and the response fits). The "PC" side usually has much more memory and can easily split communication into smaller chunks and combine the result.

Side topic: I struggle to understand how I can easily debug ZephyrCode with a debug probe like the STLINK. It was easy using Platformio, just start the debug build/flash and I could easily set breakpoints and inspect variables WITHOUT having to resort to command line GDB or similar things. What is the recommended approach for Zephyr/west ? The documentation here https://docs.zephyrproject.org/latest/guides/flash_debug/host-tools.html#flash-debug-host-tools was not helpful for me.

martinjaeger commented 2 years ago

Yes, true, it might become tricky to use only one common "ThingSet database" with a single set of buffers.

Side topic: I struggle to understand how I can easily debug ZephyrCode with a debug probe like the STLINK. It was easy using Platformio, just start the debug build/flash and I could easily set breakpoints and inspect variables WITHOUT having to resort to command line GDB or similar things. What is the recommended approach for Zephyr/west ? The documentation here https://docs.zephyrproject.org/latest/guides/flash_debug/host-tools.html#flash-debug-host-tools was not helpful for me.

As far as I understand, west debug should start a GDB server which can afterwards be used with the normal VS Code debugger (similar as PlatformIO does). However, I've never actually tried it. I'm using Segger J-Link and Ozone for debugging, which just works great. For testing purposes, you can even convert a Nucleo board into a J-Link probe.