esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

Intermittent connection issues with ESP Home devices with bk72xx #5217

Open tyzen9 opened 8 months ago

tyzen9 commented 8 months ago

The problem

I am running ESPHome 2023.11.6, and have 17 Tuya/Beken devices. They ALL give me the errors @TheAznShumai mentioned here. 6 are TreatLife dimmers, 2 TreatLife Fan controls, and the remaining are downlights. The downlights are the worst because as mentioned above they stop working when they are in this state so you are left with smart lights that will not turn off. Needless to say the Wife Acceptance Factor is dropping as the days move forward.

The config for one of my dimmers is below. I disabled web_server to make sure this was not the culprit.

I also included log entries from Home Assistant.

I am trying to get a fresh log entry from the device itself (using VERBOSE logging), but interestingly when I am connected to the logs (wirelessly) through ESPHOME the devices do not seem to throw an error. I wonder if this hints at the problem - Is there some sort of ESPHome "keep-alive" YAML setting?

Which version of ESPHome has the issue?

ESPHome 2023.11.6

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2023.12.1

What platform are you using?

BK72XX

Board

Treatlife DS01C Dimmer - Beken 1.1.17

Component causing the issue

aioesphomeapi.connection

Example YAML snippet

substitutions:
  device_description: Treatlife DS01C Dimmer - Beken 1.1.17
  device_name: dimmer-wd04
  device_friendly_name: Dimmer WD04

esphome:
  name: $device_name
  friendly_name: $device_friendly_name
  comment: $device_description

bk72xx:
  board: generic-bk7231t-qfn32-tuya

logger:
  level: ERROR
  baud_rate: 0

# web_server:

mdns:
api:
ota:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
    ssid: $device_name
    password: !secret wifi_ap_password

captive_portal:

uart:
  rx_pin: RX1
  tx_pin: TX1
  baud_rate: 9600

tuya:

light:
  - platform: "tuya"
    id: light_1
    name: "Dimmer"
    switch_datapoint: 1
    dimmer_datapoint: 2
    min_value_datapoint: 3
    min_value: 150
    max_value: 1000

Anything in the logs that might be useful for us?

2023-12-11 16:23:29.325 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd03 @ 192.168.9.36: Connection error occurred: dimmer-wd03 @ 192.168.9.36: EOF received
2023-12-11 16:24:05.423 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl05 @ 192.168.9.62: Connection error occurred: downlight-wdl05 @ 192.168.9.62: EOF received
2023-12-11 16:24:23.343 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl07 @ 192.168.9.80: Connection error occurred: downlight-wdl07 @ 192.168.9.80: EOF received
2023-12-11 16:24:45.617 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl09 @ 192.168.9.149: Connection error occurred: downlight-wdl09 @ 192.168.9.149: EOF received
2023-12-11 16:24:47.402 WARNING (MainThread) [aioesphomeapi.connection] light-wdl6 @ 192.168.9.216: Connection error occurred: light-wdl6 @ 192.168.9.216: EOF received
2023-12-11 16:24:47.657 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl08 @ 192.168.9.160: Connection error occurred: downlight-wdl08 @ 192.168.9.160: EOF received
2023-12-11 16:25:26.322 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd07 @ 192.168.9.166: Connection error occurred: dimmer-wd07 @ 192.168.9.166: EOF received
2023-12-11 16:26:28.488 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd02 @ 192.168.9.34: Connection error occurred: dimmer-wd02 @ 192.168.9.34: EOF received
2023-12-11 16:27:35.344 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd05 @ 192.168.9.91: Connection error occurred: dimmer-wd05 @ 192.168.9.91: EOF received
2023-12-11 16:29:59.155 WARNING (MainThread) [aioesphomeapi.connection] light-wdl6 @ 192.168.9.216: Connection error occurred: light-wdl6 @ 192.168.9.216: EOF received
2023-12-11 16:30:42.232 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd01 @ 192.168.9.202: Connection error occurred: dimmer-wd01 @ 192.168.9.202: EOF received
2023-12-11 16:32:22.318 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd02 @ 192.168.9.34: Connection error occurred: dimmer-wd02 @ 192.168.9.34: EOF received
2023-12-11 16:34:12.149 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd05 @ 192.168.9.91: Connection error occurred: dimmer-wd05 @ 192.168.9.91: EOF received
2023-12-11 16:34:23.933 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl07 @ 192.168.9.80: Connection error occurred: downlight-wdl07 @ 192.168.9.80: EOF received
2023-12-11 16:38:22.765 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd02 @ 192.168.9.34: Connection error occurred: dimmer-wd02 @ 192.168.9.34: EOF received
2023-12-11 16:40:05.932 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl05 @ 192.168.9.62: Connection error occurred: downlight-wdl05 @ 192.168.9.62: EOF received
2023-12-11 16:40:08.240 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl08 @ 192.168.9.160: Connection error occurred: downlight-wdl08 @ 192.168.9.160: EOF received
2023-12-11 16:40:24.374 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl07 @ 192.168.9.80: Connection error occurred: downlight-wdl07 @ 192.168.9.80: EOF received
2023-12-11 16:42:03.453 WARNING (MainThread) [aioesphomeapi.connection] downlight-wdl09 @ 192.168.9.149: Connection error occurred: downlight-wdl09 @ 192.168.9.149: EOF received
2023-12-11 16:43:08.203 WARNING (MainThread) [aioesphomeapi.connection] dimmer-wd04 @ 192.168.9.217: Connection error occurred: dimmer-wd04 @ 192.168.9.217: EOF received

Additional information

Links to the devices used: Dimmer Link Fan Switch Link

These were flashed with Kickstart using TuyaCutter

tyzen9 commented 8 months ago

Am I the only user experiencing this?

tyzen9 commented 8 months ago

nudge

TheAznShumai commented 8 months ago

nudge

I know you referenced my findings but I'm not sure what link you're using. I still get those Intermittent connections, have you tried using the latest version of esphome to flash and the latest version of home assistant? I know there was updates to the aioesphomeapi and was bumped recently. Let me know if it works for you as I still haven't found time to research/resolve this.

I currently use the mqtt on the firmware to deal with the issues which is better, but not great/perfect.

LukasJerabek commented 8 months ago

You can also try to add

sensor:
  - platform: wifi_signal
    name: Wifi Signal Strength
    update_interval: 60s

And measure the signal strength directly on the device, if its really bad (-80 and lower) it might be the cause.

TheAznShumai commented 8 months ago

You can also try to add

sensor:
  - platform: wifi_signal
    name: Wifi Signal Strength
    update_interval: 60s

And measure the signal strength directly on the device, if its really bad (-80 and lower) it might be the cause.

For my case, all my lights were lower than ~59 at the worst. There weren't any logged time where it was lower than that in my logs. Anyone have any other ideas for sensors or log output we can try?

pmannk commented 7 months ago

What wifi tech are you using? I had very similar problems in my early days of using ESPHome - specific "IoT" chipsets refused to play nicely with my wifi deployment (at the time OpenWRT). Switching to OpenWRT's non-CT drivers eliminated the issues in my case (I've subsequently moved to Ubiquiti as hardware needed refreshing).

My point here is your issue could be a mix of both ESPHome and your wifi deployment.

TheAznShumai commented 7 months ago

What wifi tech are you using? I had very similar problems in my early days of using ESPHome - specific "IoT" chipsets refused to play nicely with my wifi deployment (at the time OpenWRT). Switching to OpenWRT's non-CT drivers eliminated the issues in my case (I've subsequently moved to Ubiquiti as hardware needed refreshing).

My point here is your issue could be a mix of both ESPHome and your wifi deployment.

Currently I'm using a Decos ax 5300 in AP mode and the routing is done via a opnsense router. I thought about moving to the Ubiquiti as a refresh for my 5300. Anyone have any thoughts if this could be the issue? Maybe a routing setting I'm missing or perhaps decos have bad mesh connections?

pmannk commented 7 months ago

Anyone have any thoughts if this could be the issue? Maybe a routing setting I'm missing or perhaps decos have bad mesh connections?

That's a hard one to answer. If you're currently running a mesh deployment then a relatively easy test is to shut down all the mesh nodes except 1 (which has a wired backhaul to your switch/router) and see if the behaviour changes. Ubiquiti has been great in my experience, however others have reported issues at various times. I'm also not operating in a mesh environment - all APs have a wired backhaul

tyzen9 commented 4 months ago

Sorry, all thank you for taking time to respond to my original post. Life got in the way for me over here, but now I am back after trying to resolve this issue. I recently posted a ton of detail, and logging details here: https://www.reddit.com/r/Esphome/comments/1c8oms7/esphome_wifi_issues_eof_receivedconnection_reset/

I am using a single ASUS RT-AX86U running Asus-Merlin. Absolutely nothing fancy, no VLANS between the device and HA, just classic local LAN and DHCP.

Last night, in the act of minor frustration, I set all 13 of my TreatLife switche's wifi to power_save_mode: none, turned off all debug configs, removed web_server, and set logging to WARN in hopes this would magically solve the problem, but no dice.

The OTA logs do not tell me much, as I believe when the device has issues, the OTA logs stop coming through. There are several examples of this in the post above (which I am happy to move here if it at all helps), and I do not think obtaining the serial logs directly is simple, as these are Cloudcut devices. However, I'm not afraid of a soldering iron if that would help us get the logs we need to solve this problem for Treatlife devices I would just need some instruction on were to connect what :)

What else can I provide to help move the needle on this?

szupi-ipuzs commented 4 months ago

This seems to be specific to beken chips, so try asking in libretiny repo. I know there are some (not yet merged) pull requests that are related to wifi. You can try them out.

tyzen9 commented 4 months ago

Thank you I will do that!

joukio commented 1 week ago

I had the same issues with an SHP102. Changed the libretiny version to 1.6.0 and at least 1 device is now running for a day without issues. Will flash the other devices and test them as well.