emsesp / EMS-ESP32

ESP32 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
641 stars 108 forks source link

platformio/framework-arduinoespressif32 3.20011.230801 (2.0.11) #1242

Closed MichaelDvP closed 1 year ago

MichaelDvP commented 1 year ago

The new framework is released a day ago, but makes problems with ems-esp. I tested a small change and it seems it does not work. It takes a while to realize that the new software was never started. It compiles, uploads, but boots to the old slot. So updating is not possible. Tested on esp32 and esp32-S3. For now i suggest to use platform = espressif32@6.3.1 in pio_local.ini until this is solved.

proddy commented 1 year ago

ok, let's hardcode to platform.ini in [espressi32_base] and also update the ci target to use espressi32_base until we've figured out what the issue is.

MichaelDvP commented 1 year ago

Please add the change to platform.ini in your actual PR to ensure working binaries after merge.

proddy commented 1 year ago

If I look at the differences between platformio 6.3.1 and 6.3.2 they are just Python changes (https://github.com/platformio/platform-espressif32/compare/v6.3.1...v6.3.2). Both use SDK v4.4.4.

I tested on a 4MB ESP32 and the ESP-S3 and they both work, no crashes. But I'm running standalone with no EMS or sensors attached.

MichaelDvP commented 1 year ago

I think the update uses arduino-esp32 2.0.11 on IDF 4.4.5, but plaform 3.6.1 is based on arduino 2.0.9 and IDF 4.4.4 https://github.com/espressif/arduino-esp32/releases I've tested with pio pkg update and platform=espressif32 and development platform for esp32, esp32s3 and esp32c3. On all i can compile without errors and upload to a running ems-esp. But it starts always the old software, not the uploaded. Flashing via usb ends in bootloop with RTC_Watchdog, no code-error displayed. Maybe something in idf 4.4.5 bootloader, or the serial hardware init from arduino.

proddy commented 1 year ago

Ok, I'll pio pkg update too and see if there is a change.

There is definitely something strange going on. I made some changes to the login screen (SignIn.tsx) which works when running yarn run standalone and even yarn run preview-standalone but for some reason when updating to an ESP32 it's still using the old pages. And it's not cached. Really baffling me....

MichaelDvP commented 1 year ago

when updating to an ESP32 it's still using the old pages

I think it's using the old software (other partition), count up the version to check which software is running.

proddy commented 1 year ago

that's what I thought. If I change the version.h and rebuild it shows the new version, but still uses the old web code.

proddy commented 1 year ago

I'm using PlatformIO extension version 3.3.0 (not the latest 3.3.1) in VSCode. Maybe that's a reason it works for me.

MichaelDvP commented 1 year ago

I've tried in my repo as dev2 with GH action to use platform=enspressif32, this works, but it install:

PLATFORM: Espressif 32 (6.3.2) > Espressif ESP32 Dev Module
PACKAGES: 
 - framework-arduinoespressif32 @ 3.20009.0 (2.0.9) 

and uses IDF 4.4.4.

With develop platform it installes 3.20011 (2.0.11) and ems-esp crashes.

PLATFORM: Espressif 32 (6.3.2+sha.f1fdbc5) > Espressif ESP32 Dev Module
PACKAGES: 
 - framework-arduinoespressif32 @ 3.20011.230801 (2.0.11) 
MichaelDvP commented 1 year ago

I'm using PlatformIO extension version 3.3.0 (not the latest 3.3.1) in VSCode. Maybe that's a reason it works for me.

I'm using extension 3.3.1, but that wasn't the reason. I've uninstalled the 6.3.2 and with a newly triggered pkg update it installs 6.3.2 (2.0.9) for default. That works. Using development platform uses 6.3.2 (2.0.11) and still is not working. But we can change the platformio.ini to use standard-platform.

proddy commented 1 year ago

I'm using PlatformIO extension version 3.3.0 (not the latest 3.3.1) in VSCode. Maybe that's a reason it works for me.

I'm using extension 3.3.1, but that wasn't the reason. I've uninstalled the 6.3.2 and with a newly triggered pkg update it installs 6.3.2 (2.0.9) for default. That works. Using development platform uses 6.3.2 (2.0.11) and still is not working. But we can change the platformio.ini to use standard-platform.

your correct - there was a problem with my pio installation which was using wrong versions of Python. Now I've fixed that I'll try out 6.3.2 again.

MichaelDvP commented 1 year ago

If I change the version.h and rebuild it shows the new version, but still uses the old web code.

Seems you have disabled the progmem generation in one of the latest commits. https://github.com/emsesp/EMS-ESP32/blob/45fc13f7a053066e1668a782e07b1b3ecd0b9f1c/interface/vite.config.ts#L23

proddy commented 1 year ago

Oops

proddy commented 1 year ago

by the way @MichaelDvP do you know when we build with platform = espressif32@6.3.2 (https://github.com/platformio/platform-espressif32/releases/tag/v6.3.2) the SDK version still shows as v4.4.4 in the web? I thought v6 of the espressif32 library was built on the latest IDF v5

image

MichaelDvP commented 1 year ago

Afaik the platform is compatible to IDF 5, but framework arduino is actual 2.0.9 with IDF 4.4.4 or (not working) 2.0.11 with IDF 4.4.5. https://github.com/espressif/arduino-esp32/releases See https://github.com/espressif/arduino-esp32/issues/7852 for work for idf V5

I think with framework=espidf platformio loads idf v5.

proddy commented 1 year ago

I tried using Tasmota's fork https://github.com/tasmota/platform-espressif32 with platform = https://github.com/tasmota/platform-espressif32/releases/download/2023.07.00/platform-espressif32.zip but also get into a boot loop.

I disabled mqtt, web, uart, rtc to try and find the root cause but it's none of those. I think we just need to wait until it's official

MichaelDvP commented 1 year ago

I have also tried with the ems-esp-loader, which is mainly a esp-react without any ems-function. Also bootloop. But there are also other issues with this platform: https://github.com/espressif/arduino-esp32/issues/8482 32k less heap is not good for ems-esp.

proddy commented 1 year ago

Tried the latest platformio espressif32 6.4.0 which uses Arduino v2.0.11 and IDF 5.1.1 and it's still doing the boot-loop. Which is a shame since it uses about 15KB less Flash.

I'm going to start looking disabling pieces of code to try and find what is causing it. It may be LittleFS.

MichaelDvP commented 1 year ago

With core_debug_level=5 i see only a single message

[     3][V][WiFiServer.h:42] WiFiServer(): WiFiServer::WiFiServer(port=23, ...)

then it takes a while before the RTC Watchdog resets. port 23 is strange, telnet ist started much later.

proddy commented 1 year ago

I think you're right and the telnet code is failing and causing the restarts. Telnet is initialized in the main class at https://github.com/emsesp/EMS-ESP32/blob/9ebcfe38bc3e655a488aac8c7dca83f2bc53be32/src/emsesp.cpp#L1425

before any of the setup() or loop() is run.

proddy commented 1 year ago

I think you're right and the telnet code is failing and causing the restarts. Telnet is initialized in the main class at

https://github.com/emsesp/EMS-ESP32/blob/9ebcfe38bc3e655a488aac8c7dca83f2bc53be32/src/emsesp.cpp#L1425

before any of the setup() or loop() is run.

its not Telnet. removed that code and it still loops. I think its deeper, will need to get the JTAG debugger out

MichaelDvP commented 1 year ago

Yes, it also loops with the loader, which is v3.5 and have no telnet/mqtt/ems. Also the eth test it not the cause, because it is skipped for s3/c3 and these chips also loop. I'll take the loader for analysing and take this chnge to raise it to 3.6.

proddy commented 1 year ago

not that it helps but I tried the 3.0.0 code to see if it would work with the latest espressif 2.0.11 and it also bootloops. Now definitely going to get the ESP-Prog board out and do some low level debugging. it may be in the wifi, asynctcp or webserver somewhere.

MichaelDvP commented 1 year ago

I've tried a plain copy of https://github.com/rjwats/esp8266-react wih the new platform = espressif32, disabled all features and use an original nodemcu-32. Same bootloop.

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13232
load:0x40080400,len:3028
entry 0x400805e4

Maybe @rjwats can help. I'll make some more tests and then open a issue at esp8266-react.

proddy commented 1 year ago

ha! I did the same last night with the same results (took lots of backwards tweaking to get it to compile!).

I've got my ESP-Prog dusted off so will debug and see where it breaks.

proddy commented 1 year ago

It could be something with the partition tables. With 2.0.9 I see

Configuring flash size...
Flash will be erased from 0x00001000 to 0x00005fff...
Flash will be erased from 0x00008000 to 0x00008fff...
Flash will be erased from 0x0000e000 to 0x0000ffff...
Flash will be erased from 0x00010000 to 0x001e4fff...

and with 2.0.11 I see

Configuring flash size...
Flash will be erased from 0x00001000 to 0x00004fff...
Flash will be erased from 0x00008000 to 0x00008fff...
Flash will be erased from 0x0000e000 to 0x0000ffff...
Flash will be erased from 0x00010000 to 0x00175fff...

notice the last line. Then I read about a bug in pio where the partition CSV files need to have an offset. So I changed our esp32_partition_4M.csv to:

# Name,   Type, SubType, Offset,   Size,    Flags
nvs,      data, nvs,     0x9000,   0x5000,
otadata,  data, ota,     0xE000,   0x2000,
app0,     app,  ota_0,   0x10000,  0x1F0000,
app1,     app,  ota_1,   0x200000, 0x1F0000,
spiffs,   data, spiffs,  0x3F0000, 0x10000,

but it still didn't work!! I'll try again tomorrow

MichaelDvP commented 1 year ago

Seem a timing issue, the class setup have to be moved to the setup() function. Tested with the plain esp8266React: This bootloops

main.cpp:
#include <ESP8266React.h>
AsyncWebServer server(80);
ESP8266React esp8266React(&server);
ESP8266React * espReact;
void setup() {
  espReact->begin();
  server.begin();
}
void loop() {
  esp8266React.loop();
}

This works:

main.cpp:
#include <ESP8266React.h>
AsyncWebServer server(80);
ESP8266React * espReact;
void setup() {
  espReact = new ESP8266React(&server);
  espReact->begin();
  server.begin();
}
void loop() {
  espReact->loop();
}
proddy commented 1 year ago

nice! I'll look into the ESP8266React object to see what is initialized too earlier that could cause this. I checked again and the Partition tables look fine so that was just the wrong rabbit hole last night ;-(

MichaelDvP commented 1 year ago

I've changed the loader to register esp8266React in setup and it works, so partition table/LittleFS is not the cause. But flash usage is 30k higher as with espressif32@5.2.0: grafik

proddy commented 1 year ago

that's a shame, I thought the heap issues would have been resolved (https://github.com/espressif/arduino-esp32/issues/8482)

MichaelDvP commented 1 year ago

It's flash, not heap. Appr. the same as platform 6.3.x, The loader was on 5.2.0 using less flash.

MichaelDvP commented 1 year ago

Solution found: In NetworkSettingsService move all WIFI.xx calls to the begin() function. Works in esp8266React and in Loader, Now compiling ems-esp. I'll make a PR.

proddy commented 1 year ago

amazing!! I spent 2 hours last night debugging this and got so deep in the RTOS looking at the why it crashed in the https://www.freertos.org/xSemaphoreCreateRecursiveMutex.html (stateful machine) I stopped

MichaelDvP commented 1 year ago

But with new platform the memory is lower. With bus disconnected, mqtt off i have on ESP32: ESP32-v4.4.4: heap: 156, max_alloc:107, flash-size: 1978 EPS32-v4.4.5: heap: 122, max_alloc: 71, flash_size: 1981

3k more flash, and 34/36k more heap usage. Maybe RAM has a missing 32k segment of SRAM1? Is it not counted or not used? Stay on 6.3.2 for the ems32-4M build?

S3 chip with my test build, bus active, mqtt active: S3-v4.4.4: heap 186, max_alloc: 167, psram_free: 8144, flash: 2028 S3-v4.4.5: heap 189, max_alloc: 171, psram_free: 8151, flash: 2033

Here free ram is better, in heap and psram.

proddy commented 1 year ago

Yes, unfortunately stay on 6.3.2 for the 4MB build. It's probably worth asking the https://github.com/espressif/arduino-esp32 community what is causing this and also @Jason2866 - he built a custom lib that excludes the mbedtls stuff, and I think we need that for the Wifi SSL

MichaelDvP commented 1 year ago

I think it is solved, but we have to wait until it is released. https://github.com/espressif/arduino-esp32/issues/8482#issuecomment-1664767990 shows, that only esp32 is affected and https://github.com/espressif/arduino-esp32/issues/8482#issuecomment-1689843880 that it is solved:

proddy commented 1 year ago

It's fixed in 2.0.12 I see from the roadmap, and it may be a while until PIO upgrades, so we could just point to this GIT in the platformio.ini's platform?

I haven't tested but hoping it will free up more available Flash mem so we can complete the Turkish translations on the 4MB

or not.

We need to get 6.3.1 out soon because of the mqtt bug and also Kees has a whole box of new gateway boards to flash.

MichaelDvP commented 1 year ago

just point to this GIT

Do you have a link? There is also: the fixed tasmota version to test: platform = https://github.com/tasmota/platform-espressif32/releases/download/2023.08.01/platform-espressif32.zip

We need to get 6.3.1 out soon because of the mqtt bug and also Kees has a whole box of new gateway boards to flash.

Use platform 6.3.2 for the 3.6.1 release, we can switch platform in the next dev. Also the powerentites are now in my dev2, add it later.

proddy commented 1 year ago

Actually I don't how to use 2.0.12 directly from pio. I hope they upgrade soon as 2.0.12 was officially released yesterday. There are others waiting as well https://github.com/platformio/platform-espressif32/issues/1184

MichaelDvP commented 1 year ago

Hu, don't know what's missing in the tasmota platform, flash usage is drastic lower, heap is good. On my testbuild: platform 6.4.0: flash 2000, heap 122, max_alloc 71 tasmota: 6.4.0: flash 1560, heap 182, max_alloc 107

actual standalone working,will test with bus/mqtt connected.

MichaelDvP commented 1 year ago

Actually I don't how to use 2.0.12 directly from pio.

Mee too, Maybe something like: platform = espressif32 framework = https://github.com/espressif/arduino-esp32/releases/download/2.0.12/esp32-2.0.12.zip

Results with the connected S3: S3, bus and mqtt connected: espressif6.40: heap: 189, max_alloc: 171, psram_free: 8151, flash:2033 tasmota6.4.0: heap: 211 max_alloc: 195, psram_free: 8153, flash:1586 All seems to work.

proddy commented 1 year ago

Hu, don't know what's missing in the tasmota platform, flash usage is drastic lower, heap is good. On my testbuild: platform 6.4.0: flash 2000, heap 122, max_alloc 71 tasmota: 6.4.0: flash 1560, heap 182, max_alloc 107

actual standalone working,will test with bus/mqtt connected.

does MQTT over SSL work? I think the tasmota lib removed the SSL stuff (mbed)

MichaelDvP commented 1 year ago

does MQTT over SSL work? I think the tasmota lib removed the SSL stuff (mbed)

You'r right, SSL is not working. But for 4MB-esp32 without psram this is a good option.

Jason2866 commented 1 year ago

Yes, since we don't use mbedtls, i removed as most as possible ciphers. Just left the ones which are needed to connect WiFi. For ssl stuff we use BearSSL (modified and optimized in size and RAM usage). It outperforms mbedtls in every aspect. So we have reduced ram and flash footprint of the framework together with other changed sdkconfig settings which saves resources. For Tasmota based on Arduino 3.0 (working quiet nice already on C6). We removed deprecated SPIFFS and the Arduino NIMBLE. In Tasmota we use h2zero NimBLE (esp-nimble-cpp).

proddy commented 1 year ago

Then Michael let's go with Tasmota's optimized build for everything? Maybe after 6.3.1 which is patch release to fix the mqtt overflow issue. That will give us some time to discover what's different in the library.

MichaelDvP commented 1 year ago

Trying to switch to 2.0.13 i always get: C:/Users/Michael/.platformio/packages/framework-arduinoespressif32/libraries/Ethernet/src/ETH.h:28:10: fatal error: WiFi.h: No such file or directory Editing the line to #include "../../WiFi/src/WiFi.h" works, but this is not a solution for the ci-build. Any idea how to solve this?

Testing with Arduino 3.0 i stuck on errors of the onewire lib direct-io.

Jason2866 commented 1 year ago

@MichaelDvP The include file not find is a strange issue. Maybe try other Platformio LDF mode(S) For Arduino 3.0 the OneWire lib is not Arduino 3.0 ready. We have modified to get it going. This modified version works for core 2.0.x and 3.0.0. https://github.com/arendst/Tasmota/tree/development/lib/lib_basic/OneWire-Stickbreaker

Jason2866 commented 1 year ago

Taking a look how platformio.ini is done. It is not good at all. For example lib_ldf_mode cant be used several times. Dont use extends and redefine platform Dont take my next statement as offend. The whole platformio setup is somehow crude done. Have been there with Tasmota platformio setup too. The real weird with this it works a long time and a complete not related change brakes compile! Took me many hours to get a working predictable Tasmota platformio setup done.

Some experiences:

A library include file not found is a very typical indicator for something "fishy" in Platformio setup

proddy commented 1 year ago

Thanks @Jason2866 for explaining that

MichaelDvP commented 1 year ago

Tried to setup a platformio.ini with only a single env, no pio_local, no extends, etc. Set lib_ldf_mode=deep+ and added library-properties to all lib folders. No luck. Still this single error. All other #include <WiFi.h> work, only for ETH.h not. 2023.09.00 compiles without errors, 2023.09.01 and 2023.09.02 gives this error. This is really strange. I thought maybe in my environment, but github actions gives the same error. And changing this single line to #include "../../WiFi/src/WiFi.h" works, no warnings or errors.