Blackymas / NSPanel_HA_Blueprint

This allows you to configure your complete NSPanel via Blueprint with UI and without changing anything in the code
1.26k stars 234 forks source link

`Bug` TFT Upload unreliable with ble_tracker enabled #1946

Closed MichaelHeimann closed 2 months ago

MichaelHeimann commented 3 months ago

TFT Version

4.3.1

ESPHome Version

4.3.1

Blueprint Version

4.3.1

Panel Model

EU

What is the bug?

I have ble_tracker configured on 9 panels and updating from 4.3.0 to 4.3.1 with ble config enabled works mostly. The momory leak fix in 4.3 didn't resolve this but made it consistant as the starting conditions are now always about 22k free heap.

When it doesn't work, it's always running out of heap memory while uploading with 115200 baud.

From the log you can see it tried with 115200 and failed at 10:02:11 and 10:04:51 and finally succeeded with 921600 baud at 11:23:15.

This was reliable on all panels, although about 50% updated correctly on the first try (with 921600).

The free heap is always going down during tft flash, but because the upload is just faster with 921600 it reaches 100% before out of memory.

Since we only need the last 15% of the tft in small updates this can finish. If we have bigger changes in the feature or during initial flashing this issue hits harder.

Steps to Reproduce

1) add ble_tracker to the nspanel yaml 2) hit update tft display button in HA

Your Panel's YAML

esp32_ble_tracker:
  id: ble_tracker
  scan_parameters:
    active: false
binary_sensor:
  - platform: ble_presence
    ibeacon_uuid: '12624013-e6fa-4a02-aca7-e329827e7865'
    name: "Ivanas Handy im Bad"
  - platform: ble_presence
    # aus dem Log: ibeacon_uuid: '91253BE1-9327-C8AA-1747-1038673A6F20'
    ibeacon_uuid: '206f3a67-3810-4717-aac8-2793e13b2591'
    name: "Michis Handy im Bad"

sensor:
  - platform: ble_rssi
    ibeacon_uuid: '206f3a67-3810-4717-aac8-2793e13b2591'
    name: "Michis Handy Bad RSSI"
  - platform: ble_rssi
    ibeacon_uuid: '12624013-e6fa-4a02-aca7-e329827e7865'
    name: "Ivanas Handy Bad RSSI"

script:
  - id: !extend upload_tft
    then:
      - lambda: |-
          static const char *const TAG = "CUSTOM.script.upload_tft";
          ble_tracker->dump_config();
          ESP_LOGI(TAG, "Stopping BLE Tracker scan...");
          ble_tracker->stop_scan();
          ESP_LOGI(TAG, "Disabling BLE Tracker scan...");
          ble_tracker->set_scan_active(false);


### ESPHome Logs

[logs_nsschlafzimmer_logs.txt](https://github.com/Blackymas/NSPanel_HA_Blueprint/files/14665270/logs_nsschlafzimmer_logs.txt)

### Home Assistant Logs

_No response_
edwardtfn commented 3 months ago

The BLE engine takes quite a lot of memory from esp, leaving too few for the upload (which is also memory intensive). We don't really have a solution for this yet, other than a recommendation to remove the BLE code when uploading TFT. 😩

edwardtfn commented 3 months ago
MichaelHeimann commented 3 months ago

hmm but doesn't the working upload with 921600 baud show that there is enough memory? Why does the upload process need more ram the longer it runs? Isn't that a leak?

Don't want to offend you since you clearly have more knowledge here than I do. I am grateful for all the work you do for this project.

edwardtfn commented 3 months ago

Why does the upload process need more ram the longer it runs? Isn't that a leak?

It could be, but that requires reviewing not only the Nextion component on ESPHome, but also some of it's components, like web client, etc... It takes time. :(

edwardtfn commented 3 months ago

I'm looking a lot on all those possible memory leaks. We are much better now than in v4.2, but still a lot to do.

finch6 commented 3 months ago

I'm looking a lot on all those possible memory leaks. We are much better now than in v4.2, but still a lot to do.

I agree ― with each version, it keeps getting better!!!

Keep up the great work @edwardtfn

edwardtfn commented 3 months ago

I ran some test uploading TFT in an almost clean system (only basic package, with TFT upload, but no BLE) and transferred files multiple times (alternating between US and US Land to force it starting from 0), always at 115kbps. I see the free memory going a bit down in the first 30%, but then it gets quite stable. It fluctuates, which is expected, but nothing point to a leak.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | Run 1 | Run 2 | Run 3 | Run 4 -- | -- | -- | -- | -- Before: | 200728 | 200692 | 200576 | 199784 Min: | 98528 | 99624 | 100368 | 98256 Max: | 110444 | 111068 | 111732 | 110404 Diff: | 11916 | 11444 | 11364 | 12148 Avg: | 104674 | 104329 | 105123 | 104520 Median: | 105412 | 104980 | 105744 | 105128 Sdev: | 2005 | 2151 | 2122 | 2066

image

Still the difference from the maximum free heap to minimum free heap is around 12kb, and I think the memory available when BLE is active is not giving space for this fluctuation. I can see some points for improvement on the ESPHome Nextion component, which could give us a few more bytes, so I will try to find some time to work at that.

But it's important to understand that BLE is leaving too few memory for other things, so this will probably postpone this issue, but not eliminating it. A proper solution should be freeing-up the BLE memory when a TFT upload starts, and looks like what we suggested in our customizations isn't getting that so, for now, I keep my recommendation to not have both packages (BLE and Upload TFT) installed at the same time, which I've mentioned on https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1815#issuecomment-2004597441.

edwardtfn commented 3 months ago

I'm looking for someone brave enough to run some tests with PSRAM. This is not solving the flash limitation related to using BLE, but may help with the Heap issue. We are having a discussion around this on #1983. I started testing it, but as I don't really use BLE, I don't have good parameters, so I'm looking for someone else facing memory issues to try this, but please be prepared to get a black screen and have to flash you panel via serial if something goes wrong (it haven't happened with me).

If you wanna try it, just add this as a customization to your panel's yaml:

esp32:
  framework:
    sdkconfig_options:
      CONFIG_D0WD_PSRAM_CLK_IO: "5"
      CONFIG_D0WD_PSRAM_CS_IO: "18"

psram:
quenthal commented 3 months ago

It may be to soon to ask this... But could this open the door to have thermostat and/or other expansions along WITH BL proxy running at the same time, not just making upload more stable with bluetooth?

EDIT: just noticed, basically relevant discussion over here: https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1815

edwardtfn commented 3 months ago

Why does the upload process need more ram the longer it runs? Isn't that a leak?

I was looking at the Nextion component code and it has a queue of commands to be sent to Nextion. As nothing will be sent during the TFT upload (other than the TFT packages), this queue can grow indefinitely.

I will take a look in our code to ensure nothing is send to the panel (adding to the queue) while uploading.

edwardtfn commented 3 months ago

But could this open the door to have thermostat and/or other expansions along WITH BL proxy running at the same time, not just making upload more stable with bluetooth?

You can have BLE and addon climate at the same time with the current version, but then you have to disable Upload TFT. It's not the best thing, but I think that is probably not a big issue, as you don't transfer TFT in a regular basis.

You can see more about this on my comment here: https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1815#issuecomment-2004597441

Please let me know if you need further help with that.

quenthal commented 3 months ago

But could this open the door to have thermostat and/or other expansions along WITH BL proxy running at the same time, not just making upload more stable with bluetooth?

You can have BLE and addon climate at the same time with the current version, but then you have to disable Upload TFT. It's not the best thing, but I think that is probably not a big issue, as you don't transfer TFT in a regular basis.

You can see more about this on my comment here: #1815 (comment)

Please let me know if you need further help with that.

Yes, that is good suggestion, thanks!. (In fact, that was one reason I asked in some other thread about changes and version management of TFT, i.e. I'd know and could delay ("in good faith")updating the TFT if no major/breaking changes/incompatabilies for otherwise newer firmware/blueprint are made. )

edwardtfn commented 2 months ago

Anyone here have tried v4.3.3 (DEV) and Bluetooth recently? I'm thinking of starting to wrap up for a patch release and would love to hear from someone who tried that, as PSRAM is now enabled by default.

danir-de commented 2 months ago

Sadly still doesn't fully work for me, it now at least starts the TFT upload, but crashes at about 15% to 25% and reboots.

At least now it's booting again, if you remove bluetooth for updating the tft and adding it back the successful TFT update - so a workaround by disabling bluetooth for the tft-update works now!

edwardtfn commented 2 months ago

Could you please share the log of a TFT transfer while BT is installed?

danir-de commented 2 months ago

I have attached multiple tries (just after rebuilding) to this comment: tft_upload_after_rebuild.log

edwardtfn commented 2 months ago

Ok, I see the memory still going down during TFT transfer. It is showing the full memory, but the internal memory is probably getting full and all the buffer of this transfer is done with the internal memory. I will work at that. Transfer this buffer (or at least part of) to PSRAM may not be that complex. I come back to you.

danir-de commented 2 months ago

Thank you very much @edwardtfn !

edwardtfn commented 2 months ago

If you wanna give it a try with transfer buffer (it is just a small chunk, but it's probably better than nothing), add this to your panel's yaml:

external_components:
  - source:
      type: git
      url: https://github.com/edwardtfn/esphome
      ref: nextion-23
    components:
      - nextion
      - psram
    refresh: 1s

Please let me know your results.

MichaelHeimann commented 2 months ago

I tried it and it still didn't manage to upload the tft for me with 115200 baud. It starts with 68k dram free and 85% and crashes at 27k with 94%.

logs_nsbuero_logs.txt

MichaelHeimann commented 2 months ago

With 921600 baud and the same code it starts at 75k dram free and finishes at 60k.

So, with 115200 after 10% flashing, about 40k of ram are gone. With 921600 its about 10k ram per 10%...

logs_nsbuero_logs (1).txt

edwardtfn commented 2 months ago

Ok. Thanks! I will have to look a bit more for what is taking DRAM... Anyhow, it shouldn't crash when you still have 27KB of DRAM. 😩

MichaelHeimann commented 2 months ago

I agree and would love to help but I don't know how. memory management or troubleshooting on esp32 is a new topic behind an unknown land after a steep learning curve for me ;)

danir-de commented 2 months ago

Sadly it still crashes at about 20% with baud rate 921600 - at 4% with 115200, with this being the first & last log message:

[01:12:38][D][nextion.upload.idf:104]: Uploaded 0.05%, remaining 7493508 bytes, free heap: 66612 (DRAM) + 1992403 (PSRAM) bytes
...
[D][nextion.upload.idf:104]: Uploaded 20.49%, remaining 5961604 bytes, free heap: 28596 (DRAM) + 1967607 (PSRAM) bytes 
RAM:   [=         ]  10.3% (used 33804 bytes from 327680 bytes)
Flash: [========= ]  92.7% (used 1701341 bytes from 1835008 bytes)
Bascht74 commented 2 months ago

With 921600 baud and the same code it starts at 75k dram free and finishes at 60k.

So, with 115200 after 10% flashing, about 40k of ram are gone. With 921600 its about 10k ram per 10%...

logs_nsbuero_logs (1).txt

how are you stoping the bt? For me, stopping it via the suggested method here https://github.com/Blackymas/NSPanel_HA_Blueprint/blob/main/docs/customization.md#ble-tracker doesn't work....

See: https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1983#issuecomment-2059871778

You could try to disable any bluetooth device that is around or cover the nspanel with aluminum foil (to block some signals). Then you should see some more %...

I have a lot of BT devices and it goes down much faster...

danir-de commented 2 months ago

I have the recommended script from /docs/customization.md#ble-tracker:


# Modify upload tft engine to stop BLE scan while uploading
script:
  - id: !extend upload_tft
    then:
      - lambda: |-
          static const char *const TAG = "CUSTOM.script.upload_tft";
          ble_tracker->dump_config();
          ESP_LOGD(TAG, "Stopping BLE Tracker scan...");
          ble_tracker->stop_scan();
          ESP_LOGD(TAG, "Disabling BLE Tracker scan...");
          ble_tracker->set_scan_active(false);
          ESP_LOGD(TAG, "State: %s", id(ble_proxy)->has_active() ? "Active" : "Passive");
          while (ble_proxy->get_bluetooth_connections_limit() != ble_proxy->get_bluetooth_connections_free()) {
            ESP_LOGD(TAG, "Connections: %i of %i", int(ble_proxy->get_bluetooth_connections_limit() - ble_proxy->get_bluetooth_connections_free()), int(ble_proxy->get_bluetooth_connections_limit()));
            if (id(ble_proxy)->has_active()) {
              ESP_LOGD(TAG, "Setting passive mode...");
              ble_proxy->set_active(false);
            }
            vTaskDelay(pdMS_TO_TICKS(1000));
            App.feed_wdt();
          }

Maybe you have something there https://github.com/Blackymas/NSPanel_HA_Blueprint/issues/1983#issuecomment-2059904346, I don't have time to test myself atm though.

edwardtfn commented 2 months ago

I've added that and a bit more to the main code (still in dev), so it should work without this !remove and without extending upload TFT. Just have to use the right IDs for the components.

edwardtfn commented 2 months ago

With the new add-ons, I will close this as fixed (still in DEV, but will be included in the coming release as experimental). Please feel free to reopen-it if the issue persists and feel free to keep the nice conversation and share your finding here. 😉

Thanks a lot for your support on this!!!