esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
292 stars 36 forks source link

Connectivity loss for just some ESPHome devices on WiFi network every GTK rekeying interval #3415

Open climategadgets opened 2 years ago

climategadgets commented 2 years ago

The problem

Expected Behavior

All ESPHome devices on the network stay connected.

Actual Behavior

Charts

Two different time periods on the same set of devices with GTK rekeying intervals set to 3600 (default for the networking gear), and to 28800. image image

Which version of ESPHome has the issue?

Last known affected version is 2021.10.3

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

N/A

What platform are you using?

ESP8266

Board

Wemos D1

Component causing the issue

wifi

Example YAML snippet

# Standard WiFi configuration with no gimmicks
---
esphome:
  name: <name
  platform: ESP8266
  board: d1_mini

wifi:
  ssid: <ssid>
  password: <password>

Anything in the logs that might be useful for us?

No response

Additional information

No response

climategadgets commented 2 years ago

Possibly related: #1793

nagyrobi commented 2 years ago

What AP do you use?

climategadgets commented 2 years ago

UniFi nanoHD, with Linux controller.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

climategadgets commented 2 years ago

Bump. The issue is still present with even recent ESPHome versions.

climategadgets commented 1 year ago

Another bump.

Both ESPHome and network hardware are at the last release, the behavior is still the same. Still no correlation to absolutely anything other than the GTK rekey interval, and still some devices exhibit this behavior, and some don't (obvious on the screenshots), with devices periodically moving from one group to another upon reboot.

The length of the blackout period is quite painful for applications where a 30 second overshoot can result in an x2 control signal provided.

Screenshot at 2023-03-27 21-31-00 Screenshot at 2023-03-27 21-34-14

climategadgets commented 1 year ago

Actually, after a long and careful look, there is a very indirect correlation.

Blackouts on GTK rekey interval happen only on chips with 1-Wire devices present. However, not all chips with 1-Wire devices present suffer from blackouts, there are few of them that have spotless connectivity.

One more relevant bit, blackouts happen only if 1-Wire devices are configured, if they are merely present, blackouts are not observed (this whole discovery happened because I was reducing the "experimental" setup to "production", and removed two 1-Wire devices from the configuration, but not from the breadboard - and that node hasn't had a single blackout since).

Update: It seems that only nodes having more than one 1-Wire device configured (not just physically connected) are having this problem. A node with one BME680, one configured 1-Wire device and one unused doesn't exhibit this pattern (though configuring even one 1-Wire device along with an I2C device did cause some blackouts to occur, but the pattern is different from the one reported here).

Setting the log level on those chips to VERY_VERBOSE, let's see if MQTT captures anything.