esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
294 stars 38 forks source link

Opening and closing TCP connection quickly on OTA port leads to OTA capability breakage #5242

Open charignon opened 11 months ago

charignon commented 11 months ago

The problem

First of all, thank you so much for all of your work, esphome is an amazing project and I am having a blast with it. I found what I suspect to be a security and reliability issue.

I am using esphome on Sonoff S31s. Example configs:

esphome:
  name: ${name}
  friendly_name: ${friendly_name}

esp8266:
  board: esp01_1m
  early_pin_init: false

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:
  baud_rate: 0

api:

ota:
  password: !secret ota_password

uart:
  rx_pin: RX
  baud_rate: 4800

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO0
      mode: INPUT_PULLUP
      inverted: True
    name: "Button"
    on_press:
      - switch.toggle: relay
  - platform: status
    name: "Status"

sensor:
  - platform: wifi_signal
    name: "WiFi Signal"
    update_interval: 60s
  - platform: cse7766
    current:
      name: "Current"
      accuracy_decimals: 1
    voltage:
      name: "Voltage"
      accuracy_decimals: 1
    power:
      name: "Power"
      accuracy_decimals: 1
      id: my_power
  - platform: total_daily_energy
    name: "Daily Energy"
    power_id: my_power

switch:
  - platform: gpio
    name: "Relay"
    pin: GPIO12
    id: relay
    restore_mode: RESTORE_DEFAULT_OFF
  - platform: restart
    name: "${name} Restart"

time:
  - platform: sntp
    id: my_time

status_led:
  pin:
    number: GPIO13
    inverted: True

As I am flashing many such devices. So to speed things up on my end, I wrote up a script to gather for me the list of ESPHOME devices on my network. It works by scanning the ports 3232 and 8266, by trying to establish a TCP connection and closing it right away. The act of scanning appears to throw the devices in a state where OTA isn't possible anymore unless I restart them manually.

The logs are full of:

[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:00][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104
[22:58:01][W][ota:147]: Socket could not enable tcp nodelay, errno: 104

I just looked at the source code. It is my first time looking at it so take my analysis with a grain of salt :)

This seems to be happening because the scanner closes the connection immediately (between the accept and the set sockopt on the esphome side). In this code path I noticed that we forget to close the socket. So the loop always calls accept and always get the same socket on which you cannot send setsockopt and always fails, maybe we could try something like:

modified   esphome/components/ota/ota_component.cpp
@@ -144,6 +144,8 @@ void OTAComponent::handle_() {
   int enable = 1;
   int err = client_->setsockopt(IPPROTO_TCP, TCP_NODELAY, &enable, sizeof(int));
   if (err != 0) {
+    this->client_->close();
+    this->client_ = nullptr;
     ESP_LOGW(TAG, "Socket could not enable tcp nodelay, errno: %d", errno);
     return;
   }

Additionally this is probably also a security issue as you can disable OTA on a device using what I described above.

Which version of ESPHome has the issue?

2023.11.6

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

N/A

What platform are you using?

ESP8266

Board

sonoff s31

Component causing the issue

ota

Example YAML snippet

Provided an example above

Anything in the logs that might be useful for us?

Provided example above

Additional information

Provided suggestion for a fix

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.