Blackymas / NSPanel_HA_Blueprint

This allows you to configure your complete NSPanel via Blueprint with UI and without changing anything in the code
1.24k stars 232 forks source link

panel bricked if BT proxy enabled in config #1686

Closed djsomi closed 4 months ago

djsomi commented 5 months ago

TFT Version

4.2.4

ESPHome Version

4.2.4

Blueprint Version

4.2.4

Panel Model

NSPanel EU Model

What is the bug?

Panel bricked if BT proxy enabled in config

Steps to Reproduce

If flashed device with BT Proxy enabled, device will bricked. It did not even boot up, not registering on wifi, non of the buttons are working. It has to be re-flash by wire without the BT Proxy to get working again.

Your panel's YAML

substitutions:
  ###### CHANGE ME START ######
  device_name: "nspanelworkroom" 
  wifi_ssid: !secret wifi_ssid
  wifi_password: !secret wifi_password

  nextion_update_url: "http://homeassistant.local:8123/local/nspanel_eu.tft"
  nextion_blank_url: "http://homeassistant.local:8123/local/nspanel_blank.tft"

  ##### addon-configuration #####
  ## addon_climate ##
  # addon_climate_heater_relay: "1" # possible values: 1/2

  ##### CHANGE ME END #####

packages:
  remote_package:
    url: https://github.com/Blackymas/NSPanel_HA_Blueprint
    ref: main
    files:
      - nspanel_esphome.yaml # Core package
      # - advanced/esphome/nspanel_esphome_advanced.yaml # activate advanced (legacy) elements - can be useful for troubleshooting
      # - nspanel_esphome_addon_climate_cool.yaml # activate for local climate (cooling) control
      # - nspanel_esphome_addon_climate_heat.yaml # activate for local climate (heater) control
    refresh: 1s

esp32:
  framework:
    type: esp-idf

##### My customization - Start #####

bluetooth_proxy:
   active: true
wifi:
   power_save_mode: LIGHT

##### My customization - End #####

ESPHome logs

no logs available - no ip connection

Home Assistant logs

no logs available - no ip connection
andythomas commented 5 months ago

I had the same behavior, but could not reproduce it. I tried in the morning all kinds of stuff, but could not pinpoint it, yet. Maybe someone will find it helpful anyway:

1) The behavior did occur in 4.1.4 as well! It was not introduced recently, my post is 2 weeks old and I had 4.1.4 at the time 2) The panel was fixed, i.e. put back in operation without the proxy, by a serial flash, but since I have no development NSPanel and use them for heating, I did not try further with a panel while having -10°C outside... 3) Today, I used a development ESP32 board with build-in serial usb chip in an attempt the reproduce the behavior, i.e. no response via WIFi at all. It would always work as desired!

I used the following yaml file

substitutions:
  # Settings - Editable values
  device_name: "proxytest" 
  wifi_ssid: !secret wifi_ssid
  wifi_password: !secret wifi_password
  nextion_update_url: "http://homeassistant.local:8123/local/nspanel_eu.tft"  # Optional for `esp-idf` framework
  # Add-on configuration (if needed)
  heater_relay: "1"  # Possible values: "1" or "2"

# Customization area
##### My customization - Start #####
##### My customization - End #####

# Core and optional configurations
packages:
  remote_package:
    url: https://github.com/Blackymas/NSPanel_HA_Blueprint
    ref: main
    files:
      - nspanel_esphome.yaml # Core package
      # Optional advanced and add-on configurations
      # - advanced/esphome/nspanel_esphome_advanced.yaml
      # - nspanel_esphome_addon_climate_cool.yaml
      - nspanel_esphome_addon_climate_heat.yaml
      # - nspanel_esphome_addon_climate_dual.yaml
    refresh: 300s

esp32:
  framework:
    type: esp-idf

# Enable Bluetooth proxy
bluetooth_proxy:
  active: true
# Set Wi-Fi power save mode to "LIGHT" as required for Bluetooth on ESP32
wifi:
  power_save_mode: LIGHT
  fast_connect: true

esp32_ble_tracker:

#bluetooth_proxy:
#  active: true

I tried with and without embedded climate (heat), proxy active: true or not, fast_connect: true or not. The obvious thing is the lack of a Nextion touchscreen. The flash is close to 90%.

RAM:   [==        ]  17.6% (used 57676 bytes from 327680 bytes)
Flash: [========= ]  89.6% (used 1643949 bytes from 1835008 bytes)

I am 98% sure, that I was on esp-idf from the beginning, but since I made myself familiar with the project only 2.5 weeks ago, I can not say that for sure. I used to compile in ESPHome in HA which runs on a 4GB VM, but now use the standalone ESPHome on an M1 Mac. Today, I flashed via serial the first time and used OTA in every flash afterwards. However, it is still connected to the desktop for power supply.

edwardtfn commented 5 months ago

I could duplicate this when using BT and add-on climate simultaneously, and I agree this is most likely related to the memory usage, as that was an issue already with arduino even without BT, but when using too much memory.

Base version Framework Add-ons Customizations RAM Flash Comments
v4.2.5dev esp-idf upload_tft removed - 9.5% 52.9% Working fine
v4.2.5dev esp-idf - - 10.2% 61.8% Working fine
v4.2.5dev esp-idf - web_server 10.2% 63.6% Working fine
v4.2.5dev arduino - - 14.1% 70.0% Working fine
v4.2.5 arduino - web_server 14.2% 72.8% Working fine
v4.2.5dev esp-idf upload_tft removed bluetooth_proxy 16.9% 79.0% Working fine
v4.2.5dev esp-idf climate_dual
upload_tft removed
bluetooth_proxy 16.9% 80.7% Working fine
v4.2.5dev esp-idf climate_dual
upload_tft removed
bluetooth_proxy
web_server
16.9% 83.6% Working fine
v4.2.5dev esp-idf - bluetooth_proxy 17.5% 87.6% Bricked
v4.2.2 esp-idf - bluetooth_proxy 17.6% 87.1% Bricked
v4.2.5dev esp-idf climate_dual bluetooth_proxy 17.6% 89.3% Bricked
v4.2.5dev esp-idf climate_dual bluetooth_proxy
web_server
17.6% 91.4% Bricked
v4.2.5 arduino - bluetooth_proxy 17.9% 110.0% Cannot build - Flash memory exceeded
v4.2.5 arduino - bluetooth_proxy
web_server
17.9% 110.9% Cannot build - Flash memory exceeded

I've to flash via serial all the devices that got bricked on the testes above, then I will run more tests, but I believe this option where upload_tft was removed could be a work around. The downside of this is that you will have to remove bluetooth_proxy and return with upload_tft every time you need to transfer a TFT, then revert it back, but as you shouldn't be transferring TFT files every day, that could be a way to go.

andythomas commented 5 months ago

While I also liked the idea of one device less, I am perfectly fine keeping another ESP32 as a dedicated BT-proxy. In particular, people (including me ;) will ask for more and more features in this repository, so the 25% memory hog bluetooth_proxy will not fit at some point anyway. I would not mind, If the proxy is moved to 'unsupported' and a warning paragraph is put in the docs.

That being said, my curiosity is triggered, why the ESP32 development board would not even fail esp-idf, climate_dual, bluetooth_proxy and web_server at the same time. Is there some Nextion 'overhead'?

edwardtfn commented 5 months ago

That being said, my curiosity is triggered, why the ESP32 development board would not even fail esp-idf, climate_dual, bluetooth_proxy and web_server at the same time. Is there some Nextion 'overhead'?

I have no idea. 😞

djsomi commented 5 months ago

Sure, its absolutely true that separating BT proxy to a dedicated device is better, but I had no issues till 4.2.2.

edwardtfn commented 5 months ago

I've compiled with v4.2.2 and got this:

RAM:   [==        ]  17.6% (used 57604 bytes from 327680 bytes)
Flash: [========= ]  87.5% (used 1604725 bytes from 1835008 bytes)

I haven't flashed yet, but it is in the limit between the ones working and the ones failing in the table above.

edwardtfn commented 5 months ago

On my test with v4.2.2 it got bricked.

I believe this is on the limit anyways. Maybe you have being using a couple of bytes below the limit, so it worked.

Anyways, we are too close to that limit. I probably can try to find some way to save a few bytes here and there, but as soon we do something new, it will break again. I don't have to impose this limit to our development and instead I would keep BT as a customization that isn't fully supported, as from the beginning. We have a work around anyways, and we can try to have more of new features in separated packages, so one could always remove what they don't use.

andythomas commented 5 months ago

I might have found a starting point to explain the behavior. OTA updates allow only half the size of the flash, i.e. 4MB/2 = 2MB in our case (The NSPanel has 4MB flash). This seems to be a safety net to never end up with an unbootable device, even in case of a power outage. Also, the arduino and esp-idf framework have different partition tables, which is why we should use the serial flash after switching between the two. Also, there is some overhead and less than 2MB are available.

Now, the interesting part: It is possible to successfully compile with esphome, but the size does not match the respective partition table and the compile and flash will seem to succeed, but the device will not boot. Although the info is 2y old, it would explain what @edwardtfn found, because 1638400/1835008 from the link equals 89%. If someone would know how to look up the exact partition table for esp-idf and arduino flashes in esphome, maybe it would be a perfect match. I quote from the last link

it will only cause issues in rare cases where you're just between the arduino size limit and the esp idf limit. As described on discord, the fw will work just fine even with a different partition table.

Unfortunately, it is not possible to manually intervene and to give

esp32:
  framework:
    type: esp-idf
  flash_size: "3.6MB"

to indicate the limit. It might also explain slight differences for the dev board and the nspanel and/or the different behavior when using different flashing tools?!

illuzn commented 5 months ago

Just here to drop in my 2 cents. Updating from 4.2.2 to 4.2.4.

I'm running a pretty standard install. My custom options are:

switch:
  - id: !extend relay_1
    restore_mode: ALWAYS_ON

  - id: !extend relay_2
    restore_mode: ALWAYS_OFF

bluetooth_proxy:
  active: true
  cache_services: true

esp32_ble_tracker:

wifi:
  power_save_mode: LIGHT

When trying to update TFT I received the following error:

[01:50:07][D][esp-idf:000]: E (653326) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
[01:50:07][D][esp-idf:000]: E (653329) esp-tls: create_ssl_handle failed
[01:50:07][D][esp-idf:000]: E (653331) esp-tls: Failed to open new connection
[01:50:07][D][esp-idf:000]: E (653333) TRANSPORT_BASE: Failed to open a new connection
[01:50:07][D][esp-idf:000]: E (653339) HTTP_CLIENT: Connection failed, sock < 0
[01:50:07][E][nextion.upload.idf:174]: HTTP request failed: ESP_ERR_HTTP_CONNECT

Presumably ESP-IDF running out of memory and failing to create the relevant HTTPS connection.

I then tried flashing locally via HTTP (insecure) from my local instance. It died at 94.8% every single time.

I thought it might be a TFT issue so I flashed NSPanel_Blank which crashed immediately.

During all this ESPHome side was still responsive and functional.

I removed the following config:

bluetooth_proxy:
  active: true
  cache_services: true

esp32_ble_tracker:

Now flashing US TFT via HTTPS from github works fine (started from 0% rather than 80% or whatever).

Have restored those removed settings and everything boots up normally.

So that confirms the most likely hypothesis that this is a memory issue.

djsomi commented 5 months ago

Yest thats was my experience also, I was NOT able to use TFT upload together with BT proxy, but at least BT proxy worked.

illuzn commented 5 months ago

Now, the interesting part: It is possible to successfully compile with esphome, but the size does not match the respective partition table and the compile and flash will seem to succeed, but the device will not boot. Although the info is 2y old, it would explain what @edwardtfn found, because 1638400/1835008 from the link equals 89%. If someone would know how to look up the exact partition table for esp-idf and arduino flashes in esphome, maybe it would be a perfect match. I quote from the last link

Here's my stats of my build with all the bells and whistles enabled:

Linking .pioenvs/mbr-nspanel/firmware.elf
RAM:   [==        ]  17.6% (used 57588 bytes from 327680 bytes)
Flash: [========= ]  91.6% (used 1681641 bytes from 1835008 bytes)
Building .pioenvs/mbr-nspanel/firmware.bin

Can confirm it boots and functions properly without any issues.

Edit: by all the bells and whistles I mean with my customisations above and using ESP-IDF

edwardtfn commented 5 months ago

So that confirms the most likely hypothesis that this is a memory issue.

I would be 100% on that base on your settings. ESP32 shares the same radio between BT and WiFi, so that could be an issue when trying to transfer TFT while BT is enabled. About memory, we are fetching a 4kb chunk of data from the http server, disconnecting, transferring that to Nextion, cleaning the memory, then repeating the process for the following 4kb. It is a bit different with Arduino, which is using bigger chunks and permanent connections, but that was causing some issues and we decided to work with the short live connections on esp-idf due to that. We might have a memory leak on the TFT transfer. Maybe improving logs will give us better info... But the thing with BT bricking while not transferring is something else. I'm not saying it's not related to memory, it probably is, but not necessarily an issue with TFT will conclude is memory.

In the end, it's the amount of code the thing making the big differences. And now we know we are oijuted to something not far from 80% of the available memory informed by ESPHome compiler. 😩

andythomas commented 5 months ago

@illuzn I cannot follow which is which: 91.6% Flash is with or without proxy.

It seems that there are at least 3 (common) options for ota partitions: 0x1c0000 (which esphome uses to calculate the %), 0x1b0000 and 0x190000, which I suspect to be the limit. Maybe it is only wishful thinking, but changing the partition table to host 0x1c0000 bytes for each ota partition would be great...

esphome changed the assumed size in this commit, which is why old screenshots of esphome compiles show 1638400 (0x190000) bytes to calculate the %. I found (against my first claim) that this can be adjusted in esphome

partitions (Optional, filename): The name of (optionally including the path to) the file containing the partitioning scheme to be used. When not specified, partitions are automatically generated based on flash_size.

So, the correct file corresponding to the NSpanel partition table would show an error after the compile and (hopefully) not allow the flash that would soft brick the device.

andythomas commented 5 months ago

Disclaimer: Do not upload the compiled file unless you are trusting me more than I do!

I was indeed able to edit a partition table, feed it to esphome that subsequently bases its calculation on it. I used the attached yaml and partitions.csv files and got (after compilation, manual download)

Compiling .pioenvs/andydevpanel/src/main.o
Linking .pioenvs/andydevpanel/firmware.elf
RAM:   [==        ]  17.6% (used 57740 bytes from 327680 bytes)
Error: The program size (1712897 bytes) is greater than maximum allowed (1638400 bytes)
Flash: [==========]  104.5% (used 1712897 bytes from 1638400 bytes)
*** [checkprogsize] Explicit exit, status 1
========================= [FAILED] Took 26.75 seconds =========================

A compile error is already much better than an unsuccessful flash and soft-brick. However, maybe getting a hand on the remaining 0x30000 bytes for the project would be even better. I will look into it and try to flash a development esp32 before I dare to flash a panel. I do not know if hard-bricking is possible.

edit: I do not think I is correct what I said. Serial flash also writes the new partition table (my understanding). Messing with it might be harmful and I deleted the reference to what I did.

edwardtfn commented 5 months ago

Nice!

What about the flash size you've shown earlier?

esp32:
  framework:
    type: esp-idf
  flash_size: "3.6MB"

On the docs it says:

flash_size (Optional, string): The amount of flash memory available on the ESP32 board/module. One of 2MB, 4MB, 8MB, 16MB or 32MB. Defaults to 4MB. Warning: specifying a size larger than that available on your board will cause the ESP32 to fail to boot.

Have it worked for you with 3.6MB?

andythomas commented 5 months ago

You have to use the csv file whose contents I linked and I utilized

esp32:
  framework:
    type: esp-idf
  partitions: "partitions.csv"

with an edited partitions.csv file. Please do not flash it, yet, unless you are sure I know what I am doing...

Edit: Directly the file for clarity [partitions.csv]

Do not listen to me! Hard bricking is possible. Now I can still flash, but afterwards the device is dead...

edit: deleted the reference to the csv file.

andythomas commented 4 months ago

Today, I got a replacement panel and read my previous post before adding more info. However, I realized that it sounds alarming. To clarify: I am now 90% sure that any weird partition table is simply removed via a subsequent serial flash. However, I am 100% sure that connecting a 5V power supply at the 8 pin panel connector while forgetting to remove the 3.3V power supply used for flashing 'hard-bricks' the device. My stupidity fried the nspanel and not any uploading I did! The ESP32 even still works...That being said, I marked (strikethrough) some lines in my previous posts that turned out to be incorrect.

I looked at the partition table schemes for the arduino and esp-idf frameworks and attached them. Please note that they both have the same amount of space for the OTA flashing: 1792K, which translates to 1792*1024=1835008=0x1c0000. This is the number esphome uses right now and I would say this is correct in either case.

1) It would be nice to have the compile show an error message instead of a soft-bricked device that requires a serial flash. I did not succeed to find a solution. (a) My solution promoted above with a modified partition table (supposedly) works as long as only OTA flashing is utilized: Flash: [==========] 104.5% (used 1712897 bytes from 1638400 bytes) But during a serial flash, the partition table would be transferred onto the nspanel. That seems like a rather inconsistent hack. (b) I found another way to limit the binary size

esphome:
  name: testcompile
  friendly_name: testcompile
  platformio_options:
    board_upload.maximum_size: 1

but for some reason this is ignored(?!) by esphome: Flash: [==== ] 43.6% (used 799509 bytes from 1835008 bytes)

2) It would be even nicer to find the underlying reason for the soft-bricking. Then, increasing the size from 0x1c0000 to even larger binary sizes should be feasible. The original firmware utilizes 0x1f0000 while still maintaining two OTA partitions, but smaller nvs storage (64K vs 436K). Related: I could never soft-brick my ESP32 dev board even with the largest binaries (<0x1c0000) from the table. Therefore, I tried to write the compiled yaml to the dev board utilizing the original (i.e. Sonoff) partition table, but so far to no avail :(

arduino

Name Type SubType Offset Size Flags
nvs data nvs 0x9000 20K
otadata data ota 0xe000 8K
app0 app ota_0 0x10000 1792K
app1 app ota_1 0x1d0000 1792K
eeprom data 153 0x390000 4K
spiffs data spiffs 0x391000 60K

esp-idf

Name Type SubType Offset Size Flags
otadata data ota 0x9000 8K
phy_init data phy 0xb000 4K
app0 app ota_0 0x10000 1792K
app1 app ota_1 0x1d0000 1792K
nvs data nvs 0x390000 436K
illuzn commented 4 months ago

Okay this might sound crazy but... this has now happened to me, even though I could flash no BT rom find then flash BT rom after.

So only 2 things have changed in the interim:

  1. Does it matter whether we are using app0 or app1 as the current rom. According to the partition table above, no, but this is one of 2 things that have changed since I last tried this.
  2. Does it matter whether we are running esphome via HA container or via own docker container? This is the other thing that has changed for me. I'm running esphome on bare metal rather than in a HA container.
andythomas commented 4 months ago

I do not think it is crazy at all. I disregarded some of my hypotheses, because nobody did seem to have your (1), yet. Imagine something writing over their supposed chunk of memory into a partition of something else. It could corrupt the app0 partition, but not the app1 partition, because it always writes at the same, but incorrect position in flash memory. Since the OTA flashes alternate between app0 and app1,

The OTA operation functions write a new app firmware image to whichever OTA app slot that is currently not selected for booting. Once the image is verified, the OTA Data partition is updated to specify that this image should be used for the next boot.

it would be really hard to track.

andythomas commented 4 months ago

@illuzn If you are really motivated, you could try the following:

Test a)

  1. Serial flash (no BT proxy)
  2. OTA flash (with BT)
  3. OTA flash (no BT)
  4. ...

Test b)

  1. Serial flash (no BT proxy)
  2. OTA flash (no BT proxy)
  3. OTA flash (BT proxy)
  4. OTA flash (no BT proxy)
  5. ...

Either (a) or (b) might always work, even when you keep alternating, and the other will always fail at the same step number, i.e. a2 or b3.

illuzn commented 4 months ago

Sorry if the following is a random stream of my thoughts... I kind wrote it as I was experimenting.

Using your procedure. Both fail.

Test A: Step 2. Completes successfully but no boot. Aborted at this step. Test B: Step 2. Completes succesfully but no boot. Handily, I had it plugged into my FTDI flasher still and picked up the boot logs. The specific error that is thrown is:

ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6608
load:0x40078000,len:15060
ho 0 tail 12 room 4
load:0x40080400,len:3816
entry 0x40080698
I (29) boot: ESP-IDF 4.4.5 2nd stage bootloader
I (29) boot: compile time 01:22:54
I (29) boot: chip revision: v3.0
I (32) boot.esp32: SPI Speed      : 40MHz
I (37) boot.esp32: SPI Mode       : DIO
I (41) boot.esp32: SPI Flash Size : 4MB
I (46) boot: Enabling RNG early entropy source...
I (51) boot: Partition Table:
I (55) boot: ## Label            Usage          Type ST Offset   Length
I (62) boot:  0 otadata          OTA data         01 00 00009000 00002000
I (69) boot:  1 phy_init         RF data          01 01 0000b000 00001000
I (77) boot:  2 app0             OTA app          00 10 00010000 001c0000
I (84) boot:  3 app1             OTA app          00 11 001d0000 001c0000
I (92) boot:  4 nvs              WiFi data        01 02 00390000 0006d000
I (99) boot: End of partition table
I (104) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=4489ch (280732) map
I (214) esp_image: segment 1: paddr=000548c4 vaddr=3ffb0000 size=04c64h ( 19556) load
I (222) esp_image: segment 2: paddr=00059530 vaddr=40080000 size=06ae8h ( 27368) load
I (233) esp_image: segment 3: paddr=00060020 vaddr=400d0020 size=13547ch (1266812) map
I (692) esp_image: segment 4: paddr=001954a4 vaddr=40086ae8 size=16a94h ( 92820) load
I (744) boot: Loaded app from partition at offset 0x10000
I (744) boot: Disabling RNG early entropy source...
I (756) cpu_start: Pro cpu up.
I (756) cpu_start: Starting app cpu, entry point is 0x4008249c
I (742) cpu_start: App cpu up.
I (773) cpu_start: Pro cpu start user code
I (773) cpu_start: cpu freq: 160000000
I (773) cpu_start: Application information:
I (777) cpu_start: Project name:     mbr-nspanel
I (782) cpu_start: App version:      2023.12.7
I (788) cpu_start: Compile time:     Jan 22 2024 10:12:24
I (794) cpu_start: ELF file SHA256:  1e4c0978122d46d1...
I (800) cpu_start: ESP-IDF:          4.4.5
I (804) cpu_start: Min chip rev:     v0.0
I (809) cpu_start: Max chip rev:     v3.99 
I (814) cpu_start: Chip rev:         v3.0

assert failed: s_prepare_reserved_regions memory_layout_utils.c:100 (reserved[i + 1].start > reserved[i].start)

Backtrace: 0x40082d26:0x3ffe3390 0x40091815:0x3ffe33b0 0x400979e5:0x3ffe33d0 0x4016d4f2:0x3ffe34f0 0x4016d112:0x3ffe3850 0x4016afcb:0x3ffe3c00 0x40082759:0x3ffe3c40 0x4007959c:0x3ffe3c80 |<-CORRUPTED

I assume that trace is not useful without my firmware file because the offsets will be different for everyone.

When it is successful (i.e. without BT) the RAM is allocated in this way:

I (645) heap_init: Initializing. RAM available for dynamic allocation:
I (652) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (658) heap_init: At 3FFB8270 len 00027D90 (159 KiB): DRAM
I (664) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (671) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (677) heap_init: At 40094458 len 0000BBA8 (46 KiB): IRAM

Now this is where it gets wild. I reinstalled esphome addon in HA just for kicks because that's the only other thing I've changed. Flashing OTA with BT boots!

Here is the startup log - notice the huge difference with the RAM allocation (this was the same every single time I was doing it from my docker container install of esphome).

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6608
load:0x40078000,len:15060
ho 0 tail 12 room 4
load:0x40080400,len:3816
entry 0x40080698
I (29) boot: ESP-IDF 4.4.5 2nd stage bootloader
I (29) boot: compile time 01:22:54
I (29) boot: chip revision: v3.0
I (32) boot.esp32: SPI Speed      : 40MHz
I (37) boot.esp32: SPI Mode       : DIO
I (41) boot.esp32: SPI Flash Size : 4MB
I (46) boot: Enabling RNG early entropy source...
I (51) boot: Partition Table:
I (55) boot: ## Label            Usage          Type ST Offset   Length
I (62) boot:  0 otadata          OTA data         01 00 00009000 00002000
I (70) boot:  1 phy_init         RF data          01 01 0000b000 00001000
I (77) boot:  2 app0             OTA app          00 10 00010000 001c0000
I (85) boot:  3 app1             OTA app          00 11 001d0000 001c0000
I (92) boot:  4 nvs              WiFi data        01 02 00390000 0006d000
I (100) boot: End of partition table
I (104) esp_image: segment 0: paddr=001d0020 vaddr=3f400020 size=4489ch (280732) map
I (214) esp_image: segment 1: paddr=002148c4 vaddr=3ffbdb60 size=04c64h ( 19556) load
I (222) esp_image: segment 2: paddr=00219530 vaddr=40080000 size=06ae8h ( 27368) load
I (233) esp_image: segment 3: paddr=00220020 vaddr=400d0020 size=13547ch (1266812) map
I (692) esp_image: segment 4: paddr=003554a4 vaddr=40086ae8 size=16a94h ( 92820) load
I (745) boot: Loaded app from partition at offset 0x1d0000
I (745) boot: Disabling RNG early entropy source...
I (756) cpu_start: Pro cpu up.
I (757) cpu_start: Starting app cpu, entry point is 0x4008249c
I (0) cpu_start: App cpu up.
I (773) cpu_start: Pro cpu start user code
I (773) cpu_start: cpu freq: 160000000
I (773) cpu_start: Application information:
I (777) cpu_start: Project name:     mbr-nspanel
I (783) cpu_start: App version:      2023.12.8
I (788) cpu_start: Compile time:     Jan 22 2024 10:49:37
I (794) cpu_start: ELF file SHA256:  5d415d0236b85c1c...
I (800) cpu_start: ESP-IDF:          4.4.5
I (805) cpu_start: Min chip rev:     v0.0
I (809) cpu_start: Max chip rev:     v3.99 
I (814) cpu_start: Chip rev:         v3.0
I (819) heap_init: Initializing. RAM available for dynamic allocation:
I (826) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (832) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (838) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (844) heap_init: At 3FFCBC58 len 000143A8 (80 KiB): DRAM
I (850) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (857) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (863) heap_init: At 4009D57C len 00002A84 (10 KiB): IRAM
I (871) spi_flash: detected chip: generic
I (874) spi_flash: flash io: dio
I (880) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.

I know nothing about c so I can't even begin to diagnose this issue or try to figure out what is going on.

Edit 1: This has been eating away at me and I think I've cracked part of the puzzle. With no BT enabled, there is 338KiB total RAM (added using the hex lens to be precise). With BT enabled, there is 241KiB total RAM. So this explains why I was having issues updating TFT over HTTPS from github. The device has ~100KiB less total RAM to work with.

Edit 2: New discovery, I was running ESPHome Beta 2023.12.8 in my HA instance (did it 2 years ago when ESP-IDF was barely supported) and ESPHome 2023.12.8. Something in the beta changes fixes the bricking problem. Noticed this because my docker container kept trying to update my NSPanel.

Edit 3: Not sure if I'm reading this correctly or not but:

I (857) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM

2023.12.8 Beta has RAM starting at 0x3FFE4350 and ending at 0x40000000

I (863) heap_init: At 4009D57C

2023.12.8 Beta then jumps to 0x4009D57C i.e. it skips the range 0x40000001 through 0x4009D57B.

0x40082d26:0x3ffe3390 0x40091815:0x3ffe33b0 0x400979e5:0x3ffe33d0 0x4016d4f2:0x3ffe34f0 0x4016d112:0x3ffe3850 0x4016afcb:0x3ffe3c00 0x40082759:0x3ffe3c40 0x4007959c:0x3ffe3c80 |<-CORRUPTED

These addresses all fall within that skipped range (presumably in use by something else). Hence the corrupted RAM and failure to boot. Looks like an upstream ESPHome issue and nothing to do with this device.

I've had enough of flashing my NSPanel for this month and I don't know c code. But if I had to guess there's an issue with the way its allocating RAM.

edwardtfn commented 4 months ago

Temporarily you can use this to flash (OTA is fine) without TFT transfer but with BT. When you have to transfer TFT, you revert it, transfer then use this again:

substitutions:
  ###### CHANGE ME START ######
  device_name: "nspanelworkroom" 
  wifi_ssid: !secret wifi_ssid
  wifi_password: !secret wifi_password

  nextion_update_url: "http://homeassistant.local:8123/local/nspanel_eu.tft"
  nextion_blank_url: "http://homeassistant.local:8123/local/nspanel_blank.tft"

  ##### addon-configuration #####
  ## addon_climate ##
  # addon_climate_heater_relay: "1" # possible values: 1/2

  ##### CHANGE ME END #####

packages:
  remote_package:
    url: https://github.com/Blackymas/NSPanel_HA_Blueprint
    ref: main
    files:
      # - nspanel_esphome.yaml  # Base package
      - advanced/esphome/nspanel_esphome_core.yaml  # Core without TFT upload engine
      # - advanced/esphome/nspanel_esphome_advanced.yaml # activate advanced (legacy) elements - can be useful for troubleshooting
      # - nspanel_esphome_addon_climate_cool.yaml # activate for local climate (cooling) control
      # - nspanel_esphome_addon_climate_heat.yaml # activate for local climate (heater) control
    refresh: 1s

esp32:
  framework:
    type: esp-idf

##### My customization - Start #####

bluetooth_proxy:
   active: true
wifi:
   power_save_mode: LIGHT

##### My customization - End #####
illuzn commented 4 months ago

Yes, that's basically what I've been doing.

The rabbit hole in this thread is trying to figure out why devices are getting soft-bricked (needing serial flash) - answer something upstream in ESPHome between 2023.12.8 (not working) and 2023.12.8 beta (working).

Out of curiousity does your config work for bluetooth proxy? According to the docs:

The Bluetooth proxy depends on ESP32 Bluetooth Low Energy Tracker Hub so make sure to add that to your configuration.

So it shouldn't work?

Edit 1: Whoa... just saw this on ESPHome:

The first time this component is enabled for an ESP32, the code partition needs to be resized. Please flash the ESP32 via USB when adding this to your configuration. After that, you can use OTA updates again.

Is this the issue we've been running into all along (nothing to do with RAM allocation or anything as I said above).

Edit 2: Nope... flashing via serial or ESPTool doesn't make a single difference. The docs are just wrong. The only thing that fixes this issue is using 2023.12.8 Beta, which for some reason correctly allocates the RAM (albeit much less so HTTPS will not work).

I'm done! My NSPanel is back on the wall and I'm not touching it again.

andythomas commented 4 months ago

@illuzn figured it out! I can confirm his last post. So many red herrings on the way...

TL;DR: esphome-2024.1.0.dev0 fixes the issue. Wait for next esphome release...

I installed (in sequence)

  1. esphome 2023.12.5 and it worked
  2. esphome 2023.12.8 does not
  3. esphome 2023.12.5 worked
  4. esphome 2023.12.6 worked
  5. esphome 2023.12.7 does not
  6. esphome 2024.1.0.dev0 worked

Now comes the kicker. Afterwards

  1. esphome 2023.12.7 worked

and not only that, everything I tried worked as well.


A possible explanation: I could install the 'normal' esphome releases via my packet manager (brew), but had to install dev via a local github esphome clone on my computer. That install also updated packages

Successfully installed chardet-5.2.0 esphome-2024.1.0.dev0 esptool-4.7.0 icmplib-3.0.4 platformio-6.1.13 pyelftools-0.30 zeroconf-0.131.0

So, one of the other packages (supposedly) had a fix and subsequent esphome downgrades used these packages. It explains everything I observed, including the weird observation of my development board always working, since it was attached to a second computer with a slightly different esphome version. Here, the rabbit hole stops for me as well, I will suppress the urge to look at the last commit to the other packages.

illuzn commented 4 months ago

@edwardtfn Suggest closed. Issue resolved - upstream problem not related to us.

X-Ryl669 commented 4 months ago

I could never soft-brick my ESP32 dev board even with the largest binaries (<0x1c0000) from the table. Therefore, I tried to write the compiled yaml to the dev board utilizing the original (i.e. Sonoff) partition table, but so far to no avail :(

The OTA with Arduino framework and esp-idf are not the same. None of them can update the bootloader (technically, they could but none do). Yet, the partition table is stored in and required in the bootloader (and it's also required in your application OTA code). As I understand it, when you run Arduino' OTA, it updates some partition's data to tell where to start on next reboot. Yet, this part isn't at the same place when the application starts and use another partition table. It's writing in the otadata somewhere in the middle of the phy partition and that prevent the app from working when it tries to start BT or WIFI by making use of the corrupted phy partition data, your panel is bricked.

So when you flash something via the serial link, there are using esptool that's erasing the flash and updating the bootloader, so it's more or less working.

Also, I don't know if it's still the case, but there is/was an issue with espressif's gcc's linker that rounded some sections up by 1 bytes, leading to unaligned bss/data sections. It was completely random, usually, cleaning and rebuilding worked correctly. It gave the same issue you're observing, that is heap at a different address, unexplained panic before even entering the main function.

quenthal commented 4 months ago

Temporarily you can use this to flash (OTA is fine) without TFT transfer but with BT. When you have to transfer TFT, you revert it, transfer then use this again:

substitutions:
  ###### CHANGE ME START ######
  device_name: "nspanelworkroom" 
  wifi_ssid: !secret wifi_ssid
  wifi_password: !secret wifi_password

  nextion_update_url: "http://homeassistant.local:8123/local/nspanel_eu.tft"
  nextion_blank_url: "http://homeassistant.local:8123/local/nspanel_blank.tft"

  ##### addon-configuration #####
  ## addon_climate ##
  # addon_climate_heater_relay: "1" # possible values: 1/2

  ##### CHANGE ME END #####

packages:
  remote_package:
    url: https://github.com/Blackymas/NSPanel_HA_Blueprint
    ref: main
    files:
      # - nspanel_esphome.yaml  # Base package
      - advanced/esphome/nspanel_esphome_core.yaml  # Core without TFT upload engine
      # - advanced/esphome/nspanel_esphome_advanced.yaml # activate advanced (legacy) elements - can be useful for troubleshooting
      # - nspanel_esphome_addon_climate_cool.yaml # activate for local climate (cooling) control
      # - nspanel_esphome_addon_climate_heat.yaml # activate for local climate (heater) control
    refresh: 1s

esp32:
  framework:
    type: esp-idf

##### My customization - Start #####

bluetooth_proxy:
   active: true
wifi:
   power_save_mode: LIGHT

##### My customization - End #####

Replacing package nspanel_esphome.yaml with advanced/esphome/nspanel_esphomecore.yaml gives few errors for me. mainly because character '':

Failed config

sensor.nextion: [source <unicode string>:1450]
  id: display_mode
  name: Display mode
  platform: nextion

  Must only consist of upper/lowercase characters, numbers and the period '.'. The character '_' cannot be used.
  variable_name: display_mode
  precision: 0
  accuracy_decimals: 0
  internal: False
  icon: mdi:phone-rotate-portrait
  entity_category: diagnostic
text_sensor.nextion: [source <unicode string>:1782]
  id: version_tft
  name: Version TFT
  platform: nextion

  Must only consist of upper/lowercase characters, numbers and the period '.'. The character '_' cannot be used.
  component_name: tft_version
  entity_category: diagnostic
  icon: mdi:tag-text-outline
  internal: False
  update_interval: never
  on_value: 
    - lambda: |-
        static const char *const TAG = "text_sensor.version_tft";
        ESP_LOGD(TAG, "TFT version: %s", x.c_str());
        if (current_page->state == "boot") {
          disp1->send_command_printf("tm_esphome.en=0");
          page_boot->execute();
          timer_reset_all->execute("boot");
        }
        check_versions->execute();
edwardtfn commented 4 months ago

So, this issue persists with ESPHome 2024.2.0b1 and as this could be an issue in the future anyways when using customizations, I've improved the documentation. I have no plans to reduce functionality to accommodate customizations, but I believe giving more details in docs will make possible for the ones using bluetooth_proxy.

edwardtfn commented 4 months ago

Replacing package nspanel_esphome.yaml with advanced/esphome/nspanel_esphomecore.yaml gives few errors for me. mainly because character '':

Could you please report this as another bug?

quenthal commented 3 months ago

I could duplicate this when using BT and add-on climate simultaneously, and I agree this is most likely related to the memory usage, as that was an issue already with arduino even without BT, but when using too much memory.

Base version Framework Add-ons Customizations RAM Flash Comments v4.2.5dev esp-idf _upload_tft removed_ - 9.5% 52.9% Working fine v4.2.5dev esp-idf - - 10.2% 61.8% Working fine v4.2.5dev esp-idf - web_server 10.2% 63.6% Working fine v4.2.5dev arduino - - 14.1% 70.0% Working fine v4.2.5 arduino - web_server 14.2% 72.8% Working fine v4.2.5dev esp-idf _upload_tft removed_ bluetooth_proxy 16.9% 79.0% Working fine v4.2.5dev esp-idf climate_dual _upload_tft removed_ bluetooth_proxy 16.9% 80.7% Working fine v4.2.5dev esp-idf climate_dual _upload_tft removed_ bluetooth_proxy web_server 16.9% 83.6% Working fine v4.2.5dev esp-idf - bluetooth_proxy 17.5% 87.6% Bricked v4.2.2 esp-idf - bluetooth_proxy 17.6% 87.1% Bricked v4.2.5dev esp-idf climate_dual bluetooth_proxy 17.6% 89.3% Bricked v4.2.5dev esp-idf climate_dual bluetooth_proxy web_server 17.6% 91.4% Bricked v4.2.5 arduino - bluetooth_proxy 17.9% 110.0% Cannot build - Flash memory exceeded v4.2.5 arduino - bluetooth_proxy web_server 17.9% 110.9% Cannot build - Flash memory exceeded I've to flash via serial all the devices that got bricked on the testes above, then I will run more tests, but I believe this option where upload_tft was removed could be a work around. The downside of this is that you will have to remove bluetooth_proxy and return with upload_tft every time you need to transfer a TFT, then revert it back, but as you shouldn't be transferring TFT files every day, that could be a way to go.

Is it possible to use latest version combining climate_heat and bluetooth_proxy? I'm forced to flash with cable everytime I try this combination as it "bricks" - even though I drop upload_tft... maybe I need to get rid of web_server as well if it installed by default? Any other suggestions?

edwardtfn commented 3 months ago

This is becoming a cat-and-mouse game. We are trying to remove things to make space for new features, but in the end ESPHome itself is also growing in RAM consumption and makes quite hard this this of developing in the limit of the available memory.

v4.2.6 with the basic package isn't including web_server and captive_portal anymore, so there's nothing that comes right to my mind that could be removed without bigger consequences...

I will take a look for some opportunities to save in the code. Maybe remove some global variables and logging could help a bit, but I'm sure you will again reach the limit pretty soon.

kleju00 commented 3 months ago

Hello, I tried to activate the Bluetooth proxy and was left with a bricked panel (black screen, no response). ESPHome yaml:

substitutions:

###### CHANGE ME START ######

  device_name: "nspanel" 
  wifi_ssid: !secret wifi_ssid
  wifi_password: !secret wifi_password

  nextion_update_url: "http://192.168.0.100:8123/local/nspanel_us.tft" # URL to local tft File
#  nextion_update_url: "https://raw.githubusercontent.com/Blackymas/NSPanel_HA_Blueprint/main/nspanel_us.tft" # URL to Github

# Enable Bluetooth proxy
bluetooth_proxy:
# Set Wi-Fi power save mode to "LIGHT" as required for Bluetooth on ESP32
wifi:
  power_save_mode: LIGHT

##### CHANGE ME END #####

##### DO NOT CHANGE ANYTHING! #####

packages:
  ##### download esphome code from Github
  remote_package:
    url: https://github.com/Blackymas/NSPanel_HA_Blueprint
    ref: main
    files: [nspanel_esphome.yaml]
    refresh: 300s

##### DO NOT CHANGE ANYTHING! #####

esp32:
  framework:
    type: esp-idf

The version of everything is the latest. Through the serial link https://web.esphome.io/ flashing does not work. Through esphome-flasher:

Using 'COM4' as serial port.
Connecting....
Detecting chip type... Unsupported detection protocol, switching and trying again...
Connecting...
Detecting chip type... ESP32
Connecting...

Chip Info:
 - Chip Family: ESP32
 - Chip Model: ESP32-D0WD-V3 (revision 3)
 - Number of Cores: 2
 - Max CPU Frequency: 240MHz
 - Has Bluetooth: YES
 - Has Embedded Flash: NO
 - Has Factory-Calibrated ADC: YES
 - MAC Address: C0:49:EF:D1:E7:44
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
 - Flash Size: 4MB
Unexpected error: The firmware binary is invalid (magic byte=FF, should be E9)

I tried to flash the panel with a file without Bluetooth proxy and also with a file that I had previously flashed successfully (when I switched from Arduino to IDF)

Any idea how to fix the panel?

andythomas commented 3 months ago

The last sentence

Unexpected error: The firmware binary is invalid (magic byte=FF, should be E9)

points towards I wrong/broken firmware file. An chance you used the wrong format (e.g. legacy vs new format) or the wrong file (e.g. firmware.elf vs firmware.bin vs firmware-factory.bin).

That might also be why

Through the serial link https://web.esphome.io/ flashing does not work.

kleju00 commented 3 months ago

Everything is fine with the bin file. ...But I found the solution!

I used the tool https://espressif.github.io/esptool-js/ I connected to the panel with a speed of 460800 and uploaded the bin file from address 0x0000. I don't know why the OTA update damaged the bootloader earlier. I will no longer try to enable bluetooth proxy.