esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

"ERROR Error binary size: Unknown error from ESP" on Shelly-Plus-1PM (ESP32-IDF) #3822

Open pfichtner opened 1 year ago

pfichtner commented 1 year ago

The problem

After flashing a Shelly-Plus-1PM via Tasmota OTA (initially installed Tasmota on that device) using a stripped configuration (see "Example YAML snippet") linked there, the ESP boots successfully the built firmware. But when trying to push an newly build firmware using ESPHome's OTA mechanism the log output is:

[...]
INFO Successfully compiled program.
INFO Resolving IP address of thename.local
INFO  -> 192.xxx.xxx.xxx
INFO Uploading /config/.esphome/build/thename/.pioenvs/thename/firmware.bin (829168 bytes)
ERROR Error binary size: Unknown error from ESP

Even worse: The workaround described in https://github.com/esphome/issues/issues/2514#issuecomment-935920531 does not work because including web_server as well as captive_portalresult in This feature is only available with frameworks ['arduino']. error messages.

Which version of ESPHome has the issue?

2022.11.1

What type of installation are you using?

Docker

Which version of Home Assistant has the issue?

no HA and not relevant

What platform are you using?

ESP32-IDF

Board

Shelly-Plus-1PM

Component causing the issue

core

Example YAML snippet

substitutions:
  device_name: "Shelly Plus 1PM"

esphome:
  name: shelly-plus-1pm
  platformio_options:
    board_build.f_cpu: 160000000L

esp32:
  board: esp32doit-devkit-v1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_FREERTOS_UNICORE: y
      CONFIG_ESP32_DEFAULT_CPU_FREQ_160: y
      CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ: "160"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

api:
  password: !secret api_password
  encryption:
    key: !secret api_encryption_key

ota:
  password: !secret ota_password

output:
  - platform: gpio
    id: "relay_output"
    pin: GPIO26

switch:
  - platform: output
    id: "relay"
    name: "${device_name} Relay"
    output: "relay_output"

binary_sensor:
  - platform: gpio
    name: "${device_name} Switch"
    pin: GPIO4
    on_press:
      then:
        - switch.turn_on: "relay"
    on_release:
      then:
        - switch.turn_off: "relay"
    filters:
      - delayed_on_off: 50ms

Anything in the logs that might be useful for us?

No response

Additional information

No response

pfichtner commented 1 year ago

No ideas/help for this issue available?

Here some additional information, I added

logger: 
  level: VERY_VERBOSE

so this is the output during the ota-update:

[D][ota:147]: Starting OTA Update from ::FFFF:192.XXX.XXX.XXX...
[V][ota:178]: OTA features is 0x01
[V][ota:198]: Auth: Nonce is efXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[V][ota:218]: Auth: CNonce is 0011YYYYYYYYYYYYYYYYYYYYYYYYYYYY
[V][ota:225]: Auth: Result is 08150815081508150815081508150815
[V][ota:233]: Auth: Response is 08150815081508150815081508150815
[V][ota:261]: OTA size is 864704 bytes
[VV][scheduler:026]: set_timeout(name='onerror', timeout=5000)
[V][component:200]: Component ota took a long time for an operation (0.06 s).
[V][component:201]: Components should block for at most 20-30ms.
[VV][scheduler:196]: Running timeout 'onerror' with interval=5000 last_execution=376391 (now=381392)
[VV][scheduler:196]: Running interval 'update' with interval=60000 last_execution=337571 (now=397571)
[VV][scheduler:196]: Running interval 'update' with interval=60000 last_execution=339147 (now=399148)

I fear unfortunately little purposeful, exciting it would be, what happens inside ota_backend_esp_idf.cpp (I guess at least), there are however unfortunately no log calls.

ssieb commented 1 year ago

Add https://esphome.io/components/ota.html#on-error to see what the error is.

pfichtner commented 1 year ago

Added

ota:
  on_error:
    then:
      - logger.log:
          format: "OTA update error %d"
          args: ["x"]
  on_state_change:
    then:
      - logger.log:
          format: "OTA state change %d"
          args: ["state"]

result is

[D][ota:147]: Starting OTA Update from ::FFFF:192.xxx.xxx.xxx...
[D][main:183]: OTA state change 1
[V][ota:178]: OTA features is 0x01
[V][ota:198]: Auth: Nonce is a6XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[V][ota:218]: Auth: CNonce is e23aYYYYYYYYYYYYYYYYYYYYYYYYYYYY
[V][ota:225]: Auth: Result is 08150815081508150815081508150815
[V][ota:233]: Auth: Response is 08150815081508150815081508150815
[V][ota:261]: OTA size is 866016 bytes
[VV][scheduler:026]: set_timeout(name='onerror', timeout=5000)
[D][main:183]: OTA state change 3
[D][main:189]: OTA update error 255
[V][component:200]: Component ota took a long time for an operation (0.11 s).
[V][component:201]: Components should block for at most 20-30ms.
[VV][scheduler:196]: Running timeout 'onerror' with interval=5000 last_execution=269076 (now=274076)
[VV][scheduler:196]: Running interval 'update' with interval=60000 last_execution=219889 (now=279889)
hunhejj commented 1 year ago

Did you find a solution @pfichtner?

pfichtner commented 1 year ago

No, I didn't. I added the things suggested by @ssieb and added the logs to the issue. The problem still occurs: After flashing ESPHome to the device it's no longer updateable via OTA (which is really bad especially for Shelly-Devices that are usually mounted inside walls)

hunhejj commented 1 year ago

Same here with a Shelly Plus i4. I flashed it with the basic example configuration and now I get the same error whenever I try to flash it with a new version OTA. I really don't want to dismount the wall switch whenever I would like to update the firmware 😕 .

mikegerwitz commented 1 year ago

Edit: This issue was opened for Shelly-Plus-1PM, which I misread; I have a Shelly 1PM (no plus), and so my issue is likely different. Sorry for the noise.

I just ran into the same issue with a new Shelly 1PM. I'm using the esphome/esphome Docker image.

But it seems that I was able to work around it. I'm not entirely certain of the exact sequence of events that led to it, but considering how recent the comments above are, I want to write a reply before any further investigation in the hope that it may be of some help.

  1. Like @pfichtner, I flashed Tasmota first so that I could do this all OTA.
  2. I created a fairly minimal base ESPHome configuration, since Tasmota didn't leave much space available (~370KiB). See below for that configuration.
  3. OTA flashing via Tasmota worked, and subsequent OTA flashing using esphome CLI also worked.
  4. I attempted to modify the configuration to introduce api and logging, to use with HomeAssistant. I received the error ERROR Error binary size: Unknown error from ESP.

The tl;dr of this post is: I was able to get a web interface flashed via the CLI, and then used that web interface to upload a full firmware image. See below for the actual error from the OTA from the logs.

Collapsed the rest of this post since I mistakenly posted about the Shelly-1PM, not the Shelly-Plus-1PM Here is the configuration I used initially, which was small enough to flash from Tasmota, which is derived from https://www.esphome-devices.com/devices/shelly-1pm/ (commented out blocks are what I removed to reduce the size of the resulting firmware): (I'm not endorsing this configuration; it just happens to be what I use for testing, since I'm just trying to verify that I can get a Shelly 1PM working with ESPHome.) ```yaml substitutions: devicename: "shelly-1pm-test" max_temp: "70.0" esphome: name: ${devicename} platform: ESP8266 board: esp01_1m comment: "Shelly 1PM" wifi: ssid: ... password: ... # needed for hidden network; skip scan fast_connect: true # LOW/HIGH messed up; esphome/issues/issues/1532 power_save_mode: HIGH # TODO: ap sensor: - platform: wifi_signal name: "WiFi Signal Strength" update_interval: 60s - platform: uptime name: Uptime Sensor - platform: hlw8012 cf_pin: GPIO05 cf1_pin: GPIO13 # needed to compile, not available on 1PM sel_pin: GPIO14 # needed to compile, not available on 1PM power: name: "${devicename} power" unit_of_measurement: W id: "shelly1pm_power" device_class: power state_class: measurement accuracy_decimals: 0 filters: # Map from sensor -> measured value - calibrate_linear: - 0.0 -> 1.0 - 110.33186 -> 20.62 - 131.01909 -> 24.32 - 341.33920 -> 62.08 - 5561.41553 -> 1000.0 - 2975.51221 -> 535.7 - 9612.66309 -> 1720.0 - 14891.35352 -> 2679.0 # Make everything below 2W appear as just 0W. - lambda: if (x < 2) return 0; else return x; update_interval: 10s # - platform: total_daily_energy # name: "${devicename} daily energy" # power_id: "shelly1pm_power" # device_class: energy # state_class: measurement # filters: # - multiply: 0.001 # unit_of_measurement: kWh - platform: ntc sensor: temp_resistance_reading name: "${devicename} temperature" unit_of_measurement: "°C" accuracy_decimals: 1 device_class: temperature calibration: b_constant: 3350 reference_resistance: 10kOhm reference_temperature: 298.15K # on_value_range: # - above: ${max_temp} # then: # - homeassistant.event: # event: esphome.overheat # data: # title: "${devicename} has overheated." - platform: resistance id: temp_resistance_reading sensor: temp_analog_reading configuration: DOWNSTREAM resistor: 32kOhm - platform: adc id: temp_analog_reading pin: A0 #logger: #api: ota: # https://esphome.io/components/time/homeassistant.html #time: # - platform: homeassistant # id: homeassistant_time # Device Specific Config output: - platform: esp8266_pwm id: state_led pin: number: GPIO00 inverted: true # TODO: light? switch: - platform: gpio name: ${devicename} pin: GPIO15 id: shelly_relay restore_mode: RESTORE_DEFAULT_OFF binary_sensor: - platform: gpio pin: number: GPIO4 filters: - delayed_on_off: 50ms # small delay for debouncing name: "Switch ${devicename}" on_press: then: - switch.toggle: shelly_relay internal: true id: switchid ``` I was (at the time) using a 1-2 month old `esphome/esphome` container, but I was not using a development build AFAIK, so I don't think #2514 was my problem. I tried enabling the web interface and flashing, which ended up working: ```yaml captive_portal: web_server: port: 80 auth: username: ... password: ... ``` (`captive_portal` I assume is unnecessary, but I added it when I was playing around, and forgot to remove it, and so I want to include it here just in case it does somehow make a difference.) That gave me a web interface. I did _not_ receive the error that @pfichtner stated above, and I did not have any trouble adding the above configuration; everything compiled just fine. I wrote my full ESPHome configuration (uncommenting the above lines, and adding an `ap` to `wifi`), and attempted to update via OTA on the command line again, and still received the same error. But taking the resulting firmware and uploading via the web interface worked. But updates via the command line continued to fail with the same error. But, I now had logs via the web interface, and saw that this was the error: ``` 03:32:02 [D] [ota:147] Starting OTA Update from 192.168.X.X... 03:32:03 [W] [ota:379] Timed out reading 4 bytes of data 03:32:03 [W] [ota:253] Reading size failed! ``` At this point, before writing this, I wanted to see if anything changed on the newest `esphome` Docker image, so I updated it. Unfortunately, I'm not sure what the previous image was (but it was from late November or early December). I found the string "Reading size failed" in the generated `ota_component.cpp`, and was going to try to debug it (e.g. starting with increasing the timeout), until I noticed that: After I got an OTA failure on the command line, re-running it a second time succeeded. So that's where I left off. I do have a second stock Shelly 1PM, and I can see how the process goes with that, but I'm not yet sure if I'll do that tonight or not. I'll post back when I do.
mikegerwitz commented 1 year ago

Edit: This issue was opened for Shelly-Plus-1PM, which I misread; I have a Shelly 1PM (no plus), and so my issue is likely different. Sorry for the noise.

I decided to try flashing my second Shelly 1PM, using the same process I did before:

  1. Flash with Tasmota;
  2. Flash with minimal image (I re-used the previous binary I had already built);
  3. Flash a full configuration OTA via the CLI using the latest esphome/esphome Docker image.

This time, OTA via the CLI seems to be working every time.

Just to remove any doubt in constructing my configuration from my above message, here it is, without secrets:

Shelly 1PM YAML ```yaml substitutions: devicename: "shelly-1pm-test" max_temp: "70.0" esphome: name: ${devicename} platform: ESP8266 board: esp01_1m comment: "Shelly 1PM" wifi: ssid: ... password: ... # needed for hidden network; skip scan fast_connect: true # LOW/HIGH messed up; esphome/issues/issues/1532 power_save_mode: HIGH # TODO: fallback ap: ssid: "${devicename} fallback" password: ... sensor: - platform: wifi_signal name: "WiFi Signal Strength" update_interval: 60s - platform: uptime name: Uptime Sensor - platform: hlw8012 cf_pin: GPIO05 cf1_pin: GPIO13 # needed to compile, not available on 1PM sel_pin: GPIO14 # needed to compile, not available on 1PM power: name: "${devicename} power" unit_of_measurement: W id: "shelly1pm_power" device_class: power state_class: measurement accuracy_decimals: 0 filters: # Map from sensor -> measured value - calibrate_linear: - 0.0 -> 1.0 - 110.33186 -> 20.62 - 131.01909 -> 24.32 - 341.33920 -> 62.08 - 5561.41553 -> 1000.0 - 2975.51221 -> 535.7 - 9612.66309 -> 1720.0 - 14891.35352 -> 2679.0 # Make everything below 2W appear as just 0W. - lambda: if (x < 2) return 0; else return x; update_interval: 10s - platform: total_daily_energy name: "${devicename} daily energy" power_id: "shelly1pm_power" device_class: energy state_class: measurement filters: - multiply: 0.001 unit_of_measurement: kWh - platform: ntc sensor: temp_resistance_reading name: "${devicename} temperature" unit_of_measurement: "°C" accuracy_decimals: 1 device_class: temperature calibration: b_constant: 3350 reference_resistance: 10kOhm reference_temperature: 298.15K on_value_range: - above: ${max_temp} then: - homeassistant.event: event: esphome.overheat data: title: "${devicename} has overheated." - platform: resistance id: temp_resistance_reading sensor: temp_analog_reading configuration: DOWNSTREAM resistor: 32kOhm - platform: adc id: temp_analog_reading pin: A0 logger: api: ota: captive_portal: web_server: port: 80 auth: username: ... password: ... # https://esphome.io/components/time/homeassistant.html time: - platform: homeassistant id: homeassistant_time # Device Specific Config output: - platform: esp8266_pwm id: state_led pin: number: GPIO00 inverted: true # TODO: light? switch: - platform: gpio name: ${devicename} pin: GPIO15 id: shelly_relay restore_mode: RESTORE_DEFAULT_OFF binary_sensor: - platform: gpio pin: number: GPIO4 filters: - delayed_on_off: 50ms # small delay for debouncing name: "Switch ${devicename}" on_press: then: - switch.toggle: shelly_relay internal: true id: switchid ``` I was honestly hoping that flashing would fail so that I could maybe provide more insight. I'll be playing around with the configuration more in the coming days, but that's as far as I'm going for tonight. But this feels like a network latency or load issue, given the timeout at play.
ssieb commented 1 year ago

Why are you using esp01_1m as the board? That only lets you use 1MB of flash which is going to cause you major OTA flashing problems. Also, this issue was opened for the Shelly-Plus-1PM which is an ESP32, so the Shelly 1PM is not applicable here.

mikegerwitz commented 1 year ago

this issue was opened for the Shelly-Plus-1PM which is an ESP32

Ah shoot, I'm sorry for the noise, and it even says ESP32 in the title. I edited my above posts to note that.

Why are you using esp01_1m as the board? That only lets you use 1MB of flash which is going to cause you major OTA flashing problems.

I was wondering why that board was used, but I only used it because of the example config that I linked; I wanted to have a working flash before I messed with the configuration. The Shelly 1 (non-PM) configuration example on that site also uses esp01_1m, so I wondered if there might be something specific to the Shellys that caused people to use it; I'm new to Shelly devices. I was going to try changing the board later.

But surely 1M is enough flash for OTA? The nondeterminism with these errors with the same firmware images suggests something else. But I don't want to muddy this issue further since my issue is not relevant. Sorry again for the noise.

ssieb commented 1 year ago

Maybe that device does only have 1MB. Remember that to OTA, you have to have enough room for the current running firmware and space to write the new firmware as well. When you compile, there's a summary of flash space used. OTA supports compression now, so that helps some, but you're still really limited with only 1MB.

flexmatics commented 1 year ago

Hi,

It seems that I have the same issue here with Shelly plus 2PM. Used code from here: https://www.esphome-devices.com/devices/Shelly-Plus-2PM. My version of Shelly is PCB v0.1.9 with ESP32-U4WDH (Single core, 160MHz, 4MB embedded flash) - i.e. needed to add this:

`# For Single Core ESP32 esphome: name: shelly-plus-2pm platformio_options: board_build.f_cpu: 160000000L

esp32: board: esp32doit-devkit-v1 framework: type: esp-idf sdkconfig_options: CONFIG_FREERTOS_UNICORE: y CONFIG_ESP32_DEFAULT_CPU_FREQ_160: y CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ: "160"`

Any advice what I can try to make OTA work? Flashing via serial is possible but painful..

flexmatics commented 1 year ago

As an update to above:

  1. Pulled out relay from the wall, flashed with esphome via serial (CH340) using: docker compose run esphome run --device /dev/ttyUSB0 /path_to_node/config.yaml
  2. OTA works normally now

The difference is that in the beggining I uploaded *.bin from Tasmota web UI. Apparently this method wasn't ok for further OTA updates with esphome via --device IP.

pfichtner commented 1 year ago

As an update to above: [OTA works when flash initially via serial ttyUSB] The difference is that in the beggining I uploaded *.bin from Tasmota web UI. Apparently this method wasn't ok for further OTA updates with esphome via --device IP.

Thanks for this hint! I also flashed the EspHome-bin via tasmota WebUI Have to check if an initial flash via serial works for me as well.

Maybe one of the maintainers already has an idea what this difference does.

pfichtner commented 1 year ago
1. Pulled out relay from the wall, flashed with esphome via serial (CH340) using: `docker compose run esphome run --device /dev/ttyUSB0 /path_to_node/config.yaml`

2. OTA works normally now

The difference is that in the beggining I uploaded *.bin from Tasmota web UI. Apparently this method wasn't ok for further OTA updates with esphome via --device IP.

I can confirm that the solution described by @flexmatics works for me as well. If you flash EspHome using serial OTA will work, if you flashed EspHome on an ESP32 where tasmota was flashed via tasmota's OTA it won't. So is this a workaround or is this "works as expected"?

pfichtner commented 1 year ago

It works a described but after each OTA flash I have to repower the shelly (which is of course problematic for this kind of devices put in walls). I am not sure but I think this worked before I updated to v2022.12.8 (docker version). @flexmatics do you use another version? Can you confirm the problem?

Update: Sometimes it does, sometimes it doesn't in non-deterministic way

ScottESanDiego commented 1 year ago

I see that the tasmota OTA is updated after this issue cropped up. Any idea if it still means we can't go tasmota -> esphome -> esphome via OTA?

I'm reticent to take my Shelly out of the wall, but kinda want to use it as another BLE tracker.

TheZoker commented 11 months ago

I can confirm that I have the same issue with my Shelly 1PM Mini.

I followed this manual to get tasmota flashed on to the device: https://github.com/tasmota/mgos32-to-tasmota32

Then I flashed ESPhome which worked without any issue. But now the update seems to be broken

alainsch commented 10 months ago

I can confirm that I have the same issue with my Shelly 1PM Mini.

Same here with a Shelly 1Plus mini. Followed the tasmota32 steps and flashed afterwards with a miniman ESPHome config allowing OTA.

Device booted, but whenever I try to send a new firmware, I got the message "Error binary size"

And with the mini series, serial flashing isn't a real option...

vumaddibly commented 10 months ago

Same issue here with a Shelly Plus 1PM. I really wish that I'd found this bug report before going down this rabbit hole. I have a Shelly Plus 1PM connected to the charger for a cordless vacuum. I wanted to run ESPHome on the Shelly so that I could have automations on the device itself but with hindsight, I wish I'd left it as a native Shelly device.

UPDATE: having removed the Shelly from the pattress box (I wish I hadn't used hot glue on the back 🤣) and downloaded the firmware via serial then we are once again 'cooking with gas' and OTA updates are working just fine 😀 One of my friends likes the saying "Every day should be a learning day" - today certainly has been!

datenzar commented 8 months ago

I'm basically in the same situation as the others. My problem, however, is that I'm using a Shelly Plug Plus S which doesn't expose a serial interface.

I assume as well that the safeboot partition consumes too much space to allow for ESPHome OTA updates. Currently I'm stuck with this semi-bricked device. Does anyone know if and how I can launch Safeboot there?

jj-uk commented 5 months ago

Is the only fix for this to pull it and flash via serial?