esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
293 stars 36 forks source link

Wifi very flaky on 2022.12+ #3969

Open fhriley opened 1 year ago

fhriley commented 1 year ago

The problem

I have a SparkFun Thing Plus that I use as a garage door controller. Up through 2022.11.5, the wifi has been rock solid. On 2022.12.0 and 2022.12.3, the wifi is very flaky. Here is the 12 hours before after the update in HA history. It's pretty obvious when I did the update. A simple ping test also shows it is rock solid in 2022.11.5 but quite bad in 2022.12+. It's bad enough that it is not possible to do an OTA update after it is flashed with 2022.12. Note that I did a serial flash when updating to 2022.12. Flashing back to 2022.11.5 fixes the issue.

Which version of ESPHome has the issue?

2022.12 and up

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

2022.12.8

What platform are you using?

ESP32

Board

SparkFun Thing Plus

Component causing the issue

wifi

Example YAML snippet

##### common.yaml

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  domain: !secret wifi_domain
  fast_connect: off
  power_save_mode: none

  ap:
    ssid: "$human_name"
    password: !secret ap_password

web_server:
  port: 80
  ota: false

captive_portal:

api:

ota:

prometheus:

sensor:
  - platform: wifi_signal
    name: "$human_name WiFi Signal"
    update_interval: 60s

binary_sensor:
  - platform: status
    name: "$human_name Connection Status"

button:
  - platform: restart
    name: '$human_name Reboot'
  - platform: safe_mode
    name: "$human_name Reboot (Safe Mode)"

##### bme680.yaml

substitutions:
  human_name: "default"
  temperature_throttle: 60s
  pressure_throttle: 60s
  humidity_throttle: 60s
  gas_resistance_throttle: 60s
  iaq_throttle: 60s
  iaq_accuracy_throttle: 60s
  co2_equivalent_throttle: 60s
  breath_voc_equivalent_throttle: 60s

i2c:

bme680_bsec:
  address: 0x77
  temperature_offset: 0
  iaq_mode: static
  sample_rate: lp
  state_save_interval: 6h

sensor:
  - platform: bme680_bsec
    temperature:
      # Temperature in °C
      name: "$human_name BME680 Temperature"
      filters:
        - median
        - lambda: return x * (9.0/5.0) + 32.0;
        - throttle: "$temperature_throttle"
      unit_of_measurement: "°F"
    pressure:
      # Pressure in hPa
      name: "$human_name BME680 Pressure"
      filters:
        - median
        - throttle: "$pressure_throttle"
    humidity:
      # Relative humidity %
      name: "$human_name BME680 Humidity"
      filters:
        - median
        - throttle: "$humidity_throttle"
    gas_resistance:
      # Gas resistance in Ω
      name: "$human_name BME680 Gas Resistance"
      filters:
        - median
        - throttle: "$gas_resistance_throttle"
    iaq:
      # Indoor air quality value
      name: "$human_name BME680 IAQ"
      filters:
        - median
        - throttle: "$iaq_throttle"
    iaq_accuracy:
      # IAQ accuracy as a numeric value of 0, 1, 2, 3
      name: "$human_name BME680 Numeric IAQ Accuracy"
      filters:
        - throttle: "$iaq_accuracy_throttle"
    co2_equivalent:
      # CO2 equivalent estimate in ppm
      name: "$human_name BME680 CO2 Equivalent"
      filters:
        - median
        - throttle: "$co2_equivalent_throttle"
    breath_voc_equivalent:
      # Volatile organic compounds equivalent estimate in ppm
      name: "$human_name BME680 Breath VOC Equivalent"
      filters:
        - median
        - throttle: "$breath_voc_equivalent_throttle"

text_sensor:
  - platform: bme680_bsec
    iaq_accuracy:
      # IAQ accuracy as a text value of Stabilizing, Uncertain, Calibrating, Calibrated
      name: "$human_name BME680 IAQ Accuracy"

##### garage-door.yaml

substitutions:
  human_name: 'Garage Door'
  device_name: garage door

esphome:
  name: $device_name

esp32:
  board: esp32thing_plus

packages:
  common: !include common/common.yaml
  bme680: !include common/bme680.yaml

i2c:
  sda: 23
  scl: 22
  scan: True
  id: bus_a

binary_sensor:
- platform: gpio
  id: garage_door_sensor
  pin:
    number: 27
    mode: INPUT_PULLUP
  name: "Garage Door State"
  device_class: garage_door
  filters:
    - delayed_on_off: 50ms

switch:
  - platform: gpio
    id: garage_door_relay
    pin: 13
    restore_mode: ALWAYS_OFF
    internal: True

cover:
  - platform: template
    name: "Garage Door"
    device_class: garage
    lambda: |-
      if (id(garage_door_sensor).state) {
        return COVER_OPEN;
      } else {
        return COVER_CLOSED;
      }
    open_action:
      - switch.turn_on: garage_door_relay
      - delay: 0.1s
      - switch.turn_off: garage_door_relay
    close_action:
      - switch.turn_on: garage_door_relay
      - delay: 0.1s
      - switch.turn_off: garage_door_relay
    stop_action:
      - switch.turn_on: garage_door_relay
      - delay: 0.1s
      - switch.turn_off: garage_door_relay

logger:
  level: INFO
  logs:
    json: INFO

Anything in the logs that might be useful for us?

No response

Additional information

No response

pmannk commented 1 year ago

I'm experiencing exactly the same issue on an ESP32_CAM board. 2012.11.5 is rock solid. 2012.12.x renders wifi unusable. My configuration for this device is quite minimal. A minimal wifi configuration doesn't improve the situation for me:

wifi:
  ssid: xxxx
  password: xxxx
  reboot_timeout: 0s

On my access point I can see the device continually disconnecting and reconnecting when running 2012.12.x:

Dec 29 15:23:30 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:30 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:38 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:39 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:47 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:47 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:56 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:23:56 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:05 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:05 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:13 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:13 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:22 <ap> hostapd: wlan1: AP-STA-DISCONNECTED f4:cf:a2:xx:xx:xx
Dec 29 15:24:22 <ap> hostapd: wlan1: AP-STA-CONNECTED f4:cf:a2:xx:xx:xx
disconn3ct commented 1 year ago

I've been having this issue on esp32-cams. (Weirdly only the "good" ones that otherwise work perfectly.)

With 2022.12.x it dropped off every 5 minutes or so, and the watchdog did not detect it. The serial logs show the HA connection closing, and it just happily shows 'Got image' over and over even with no traffic. Going into the AP console (Unifi) I can boot it off with reconnect and it will come right back online successfully, with logs showing the disconnect.

I downgraded to 2022.11.5 and it has been stable for about 30 minutes now. No other changes, just a clean and run. (As an unrelated side bug, I'm using the arduino framework because esp-idf doesn't work with any of these cameras.)

Edit: Looking at the entity history, it still seems to lose connection every ~30 minutes but immediately reconnects.

disconn3ct commented 1 year ago

I did a bunch of testing this morning. (FYI the esp32-camera module doesn't work with the IDF framework in any release.)

I haven't broken down what changed in 12.0b1 yet (huge changelog) but it definitely broke wifi+bluetooth.

Test env is a hiletgo ftdi module, plugged into a powered usb hub. (Yes, USB power, but the results are reproducible. No HW changes between runs except to short 0 for programming.)

I started testing at 2022.11.3 but it worked fine in all modes, as did 11.4.

Test process:

pipenv install esphome==2022.XXX
pipenv shell
esphome clean esp-experiment.yaml
# edit to enable/disable bluetooth proxy, BLE, BLE 100% scanning + an rssi measure
esphome run esp-experiment.yaml --device /dev/ttyUSB0 --no-logs
esphome logs esp-experiment.yaml --device /dev/ttyUSB0 | tee experiment-XXX-CONF.log

Push reset, to boot with log capture.

In parallel, ping the (static DHCP) address and click the HA UI when available, to verify camera, status light and flash work as expected.

Version Framework BT Proxy BLE Wifi Status Cam Status LED Status Notes
11.4 Arduino Yes Full Scan, RSSI OK OK OK  
11.5 Arduino Yes Full Scan, RSSI OK OK OK Occasional crashes in API (on first connection)
12.0b1 Arduino No No OK OK OK Works completely fine. Sometimes slow wifi startup (association refused temp) or wifi restart but it recovers
12.0b1 Arduino Yes No Connect, gets an IP, then no packets OK (per logs)   No ping response, but logs act normal (wifi strength, got image, etc)
12.0b1 Arduino No Full Scan, RSSI Connect, gets an IP, then no packets OK (per logs)    
12.0b1 Arduino Yes Full Scan, RSSI Connect, gets an IP, then no packets OK (per logs)  
12.4 Arduino No No OK OK OK Works fine. Sometimes slow wifi startup.

Interesting config:

wifi.reboot_timeout: 1min

esp32_camera: 
  config: as "AI-Thinker" example
  resolution: 1024x768
  brightness: 1
  saturation: 1
  max_framerate: 5 fps
  idle_framerate: 0.66 fps

esp32_camera_web_server: stream, snapshot

bluetooth_proxy: # or not
  active: true

esp32_ble_tracker: # or not
  scan_parameters:
    interval: 1100ms
    window: 1100ms
    active: true

sensor:
  - platform: ble_rssi # pawscout, if ble enabled
  - platform: wifi_signal
  - platform: copy # for percent, from examples

text_sensor:
  - platform: version
  - platform: wifi_info

binary_sensor: connection status

button: safe_mode, restart
fhriley commented 1 year ago

Thanks for testing! I upgraded to 12.4 and confirm your results. The wifi is solid again.

disconn3ct commented 1 year ago

12.4 is NOT working. Nothing later than 11.5 works with wifi and bluetooth both enabled. 12.4 only works if you disable bluetooth.

fhriley commented 1 year ago

Got it, I misunderstood. I re-opened this.

disconn3ct commented 1 year ago

Git bisect blames c2e198311ca73112fa7c9c213101277200085a6f (I haven't verified that yet, that is just from a quick first pass. Seems plausible though.)

pmannk commented 1 year ago

Interesting test results. I just tried on mine with 12.4 and it's still unreliable. I don't have bluetooth enabled either. A major difference with 2022.12 was the arduino and platform version upgrades. You can see this started back with https://github.com/esphome/esphome/pull/3564/commits/2ad10d1c968f22ef8851697e721535dc24b41e4e

I thought I'd try my luck using the old framework/platform with

esp32:
  board: esp32cam
  framework:
    type: arduino
    version: 1.0.6
    platform_version: 3.5.0

However it was no surprise when this failed to compile. pin_sscb changed to pin_sccb for a start.

rocket59 commented 1 year ago

Same issue here. I was running 11.3 up until last week and updated to 12.5 and since then my ESP32's all started dropping off the network (Unifi AP AC Pro) randomly, although they do reconnect within about 20s. All of them have pretty decent signal strengths too.

The only thing these ESPs are configured for is BLE tracking.

Not only that, the update also seemed to destabilise my network. Other IOT devices also started falling off. Moved back to 11.3 yesterday and not a single disconnect in the last 24hr.

disconn3ct commented 1 year ago

Weirdly I had the same experience as @rocket59 when I upgraded from 2022.12.8 to 2023.2.2. I upgraded this morning, and while it never rebooted, it lost wifi every few seconds for a few seconds. (I downgraded about 5 minutes before the snap, which is a sign of just how quickly it was falling off.) I also experienced destabilization of other devices (specifically other ESPs, but they are the least forgiving; other devices might have been affected without getting so much attention)

image

rocket59 commented 1 year ago

Do you have a unifi AP? If do set the data rate control to a fixed 11Mbps (if it's on the default auto settings) to see if that makes a difference.

mac-city commented 1 year ago

Do you have a unifi AP? If do set the data rate control to a fixed 11Mbps (if it's on the default auto settings) to see if that makes a difference.

where do you set that?

rocket59 commented 1 year ago

In the unifi controller (either software or hardware).

tomlut commented 1 year ago

Do you have a unifi AP? If do set the data rate control to a fixed 11Mbps (if it's on the default auto settings) to see if that makes a difference.

Since 6am this morning a lot of my ESP devices have random disconnects. Changing the minimum data rate control made no difference.

I can see the devices connected to the AP with 100% user experience. The HA API is just randomly disconnecting.

UPDATE: noticed that most of the intermittent devices were on one AP. And lots of retries. Narrowed it down to a failing smart plug spamming the wifi with junk. Once removed all other devices returned to stability.

Interesting that Unif did not report any reduction in user experience or increase in interference, just retries.

pmannk commented 1 year ago

The comment above reminds me I was meaning to update this thread. I have what appears to be a stable combination of ESPHome and OpenWRT at the moment, for the problematic ESP32_CAM board. I've since updated all my boards.

TL;DR: OpenWRT ath10k non-ct drivers with ESPHome 2023.6 results in stable wifi.

ESPHome 2022.11 wifi was stable, regardless of my OpenWRT driver selection ESPHome 2022.12 - 2023.2 was unpredictable, regardless of my OpenWRT driver selection I stopped testing both ct and non-ct combinations after 2023.2 as it wasn't making a difference. Fast forward to 2023.6 and I decided to test again. With non-ct drivers it's stable. With ct drivers it's better than I saw with 2022.12 - 2023.2, but there is still packet loss which I don't see when using the non-ct drivers.