beveradb / sonoff-lan-mode-homeassistant

Home Assistant platform to control Sonoff switches running the latest Itead firmware, locally (LAN mode).
MIT License
74 stars 37 forks source link

HomeAssistant unresponsive after connection refused error #56

Closed mattsaxon closed 4 years ago

mattsaxon commented 5 years ago

HA sometime becomes unresponsive, the UI is no longer available, returning ERR_CONNECTION_REFUSED to the browser.

I have had this multiple times over the last few days since starting to use the component (I am on latest code as of today and it also occurred with previous versions)

Below is an abstract of the error which occurred on startup of HA, the switch was working faultlessly prior to restart.

To recover, I have restarted HA from within an SSH shell.

I believe this failure also occurs during standard use with the same effect, I will report back with more detail when this next occurs.

2019-04-16 12:48:19 ERROR (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Unable to connect: connection refused
2019-04-16 12:48:19 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] shutdown_event_loop called, setting keep_running to False
2019-04-16 12:48:19 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] finally: closing websocket from setup_connection
2019-04-16 12:48:19 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Closing websocket from client close_connection
2019-04-16 12:48:19 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection resumed, exiting
2019-04-16 12:48:19 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop finally block reached: closing websocket
mattsaxon commented 5 years ago

Another trace for your consideration. This one is a startup trace with all the sonoff_land_mode entries.

This trace only has a single switch configured, so may be easier to follow. Again HA is hung, but for this problem I had to resort to a host reboot, not just a restart of HA.

2019-04-16 19:39:30 INFO (MainThread) [homeassistant.components.switch] Setting up switch.sonoff_lan_mode
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Initializing SonoffLANModeClient class in SonoffDevice
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch __init__ finished creating SonoffSwitch
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop is active on the event loop
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Starting loop waiting for device params to change
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop now awaiting event
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection is active on the event loop
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to connect()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Connecting to websocket address: ws://192.168.0.156:8081/
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain config took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain cloud took 1.3 seconds.
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch async_update called
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Sonoff device basic info still none, waiting for init message
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch returning _name: Master_Blanket_Door
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch returning _available: False
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded binary_sensor from homeassistant.components.binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setting up binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain binary_sensor took 0.0 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain sensor took 1.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain tts took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded cloud.binary_sensor from homeassistant.components.cloud.binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain switch took 0.6 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.components.binary_sensor] Setting up binary_sensor.cloud
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain automation took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded panel_custom from homeassistant.components.panel_custom
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain hassio took 0.6 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.bootstrap] Home Assistant initialized in 6.23s
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.core] Starting Home Assistant
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.core] Timer:starting
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to send_online_message()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Sending user online message over websocket
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Received user online response:
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] {'error': 0, 'apikey': '[removed]', 'sequence': '15554435707193477', 'deviceid': '[removed]'}
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Received basic device info, storing in instance
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Websocket connected and accepted online user OK
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to receive_message_loop()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Waiting for messages on websocket
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] receive_message_loop finally block reached: closing websocket
2019-04-16 19:39:30 ERROR (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Connection closed unexpectedly
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] shutdown_event_loop called, setting keep_running to False
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] finally: closing websocket from setup_connection
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Closing websocket from client close_connection
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop finally block reached: closing websocket
beveradb commented 5 years ago

Ooft, thanks for the trace and attempts to narrow down the issue, and sorry to hear the component actually crashed your Home Assistant host - that's definitely not good!

I don't have capacity to look into this properly right now (e.g. checkout code and try to reproduce) I'm afraid, my evenings and weekends are consumed by lesson prep for work just now, but I hope to have a little more free time come May.

I'm always happy to answer questions and do code review / merge of any decent pull requests though!

mattsaxon commented 5 years ago

I've made some progress on fixing this (in pysonofflan.py).

I'm testing a fix at the moment locally and if successful, will submit a PR is due course

Issue was a result of;

  1. websocket.recv() was getting ConnectionClosed exception.
  2. This was not caught, but the receive loop is exited, resulting in the component instance stopping functioning.

I'm very new to HA and this is my first foray into Python, so it might take me a while to tidy this up, but I though I'd post progress here.

For now I've fixed with a retry loop in the setup_connection() function.

One thing that I am still confused about is that in testing, if I unplug the device (Sonoff S26) there are no failures, it's only when I plug back in I get the failure. I'm surprised the websocket.send() method doesn't return an error, but I note that there is a change to the PONG code. @beveradb, can you shed any light on how ping/pong is implemented by Itead and why you found it necessary to override this from the standard websockets implementation?

mattsaxon commented 5 years ago

Further update.

It looks to me as if the websockets version in HA is v6 and automatic ping/pong only came in with v7 of websockets (see https://websockets.readthedocs.io/en/stable/changelog.html).

@beveradb, I don't see any PONG debug messages when running inside HA, can I check if you've seen any and it is something to do with my specific setup (for example I'm running Hassio on a Pi)

I'll look to upgrade this tomorrow unless you think I'm barking up the wrong tree

mattsaxon commented 5 years ago

My draft fix is within PySonOffLan

Here is the issue in that Repo which also linked to the PR https://github.com/beveradb/pysonofflan/issues/45

I have been testing this with websockets 6.0 at the moment, but will look to upgrade to 7 in due course.

beveradb commented 5 years ago

Thanks for your efforts on this, it looks like you're making progress!

So to respond to a couple of your points:

websockets version

Unfortunately, when I implemented pysonofflan, I didn't think about the possibility of a dependency clash with other Home Assistant dependencies. As such, my implementation was based on the current version (7), was tested with that version, and is reliant on features such as the built-in ping/pong to be stable.

When installed and used on the command line, with Python 3.7, as a standalone tool for controlling the devices - it works fine, as the latest version of websockets is used. However, when used in Home Assistant as part of sonoff-lan-mode-homeassistant, the version of websockets is limited to version 6 as unfortunately there is another HA dependency which uses websockets and explicitly specifies version 6.

I think this is a core issue, and something which it's worth doing further investigation around - e.g. identifying the other HA dependency which requires version 6, and trying to get that to update to use the newer version.

An alternative (if that proves impossible/challenging) could be to investigate rewriting pysonofflan to use a different websocket library. For example, https://github.com/dpallot/asyncws could be a potential option - it seems a bit dated/inactive but appears to be used/recommended by Home Assistant here: https://developers.home-assistant.io/docs/en/external_api_websocket.html ?

"why you found it necessary to override the ping/pong from the standard websockets implementation"

Sure! In short, I believe the ping/pong working correctly is essential for the connection to the device to be kept open indefinitely, and it didn't seem to work correctly out of the box with websockets version 7. I did a bit more investigation and found that the pong response payload which the Sonoff device was responding with after receiving a ping didn't contain the same payload as the ping it was sent.

As per various resources on the protocol, the pong payload should be identical to the ping, so the websockets library code follows the protocol correctly but the Sonoff device doesn't.

If you look through the websockets library original read_data_frame implementation, you see it is adhering to the websocket protocol correctly by comparing ping/pong payloads before clearing pings.

In my modified version of read_data_frame, I've stripped out all the comparison logic so it's no longer technically adhering to the websocket protocol correctly, but it works with the Sonoff as it no longer cares what the payload of the pong is.

Hope this helps somewhat, happy to answer any other questions as best as I can.

very new to HA and this is my first foray into Python

Me too! Blind leading the blind here then I guess 😄

mattsaxon commented 5 years ago

Thanks Andrew, that all makes sense, I appreciate the detailed explanation.

Pity on the HA dependency on WS6. I'd stuck with 6 anyway for now as am making progress. The code won't be as tidy as it could be with ping/pong implemented though.

I've currently got a working version that caters for all failures I've come across. It needs some tidying up before I commit anything though. Hopefully will get through this in the coming week.

mattsaxon commented 5 years ago

@beveradb, one further question on asyncio. I'm on the same learning curve and I wondered if we might simplify this by supporting less versions of Python. I note you are supporting 3.5, 3.6, 3.7 at the moment and that asyncio has made quite significant changes between these. I'm using 3.7 on the basis it is the latest and also the one the latest HA uses. Can we simplify our learning by just targeting one, or are there other users that need the earlier versions (e.g. has it been say integrated into OpenHAB or something else?)

beveradb commented 5 years ago

simplify this by supporting less versions of Python

I'm all for simplification, and had the same frustration with asyncio changing substantially between 3.5-3.7, but I'm keen to get this component into the upstream Home Assistant project at some point so we have to give that consideration.

As such, it has to support the same Python versions supported by HA itself - currently 3.5.3 and above. This is why I targeted those versions.

I'm using 3.7 on the basis it is the latest and also the one the latest HA uses

Not quite sure where you get that from - see above links, HA still supports 3.5.3 and above at present.

There are a few interesting/relevant points on this blog post from when HA deprecated support for Python 3.4 back in 2017 - mostly on the basis of allowing them to make it fully async.

It does seem like there's starting to be momentum around supporting only the latest 2 versions of Python (which would mean dropping support for 3.5) - see this recent discussion led by the Home Assistant founder/leader Paulus.

In that thread, he suggests:

that we deprecate Python 3.5 starting the first release after June 1, 2019

with mostly positive responses.

Based on that time frame, I'd personally be comfortable with both pysonofflan and sonoff-lan-mode-homeassistant both dropping support for 3.5.x right now, as it's unlikely to be at a decent enough quality for merge consideration before that date anyway.

So; long message, but - if you feel like rewriting things to make use of improvements in Python 3.6, and therefore dropping support for Python 3.5, go ahead! 😄

mattsaxon commented 5 years ago

Thanks for all the info. My (incorrect) assertion on HA using 3.7 was based on my limited use, which has all been on Hassio to date.

I'll consider all you say whilst I'm refactoring my retry workings into something that is fit for review.

Do you have a preference for how I submit, for example would you like issues raised for each piece I find or shall I just add comments to PRs and we can review the code in the same way you looked at my initial PR?

If the PR is too large on first look, I'd be happy to go back and submit smaller PRs so we can incrementally improve (I would have liked to do this first time around, but I've been needing to learn Python and asyncio so my commits would have been very poor quality, hopefully I've learnt enough now that they will be reasonable!)

beveradb commented 5 years ago

Sounds good, thanks!

No particular preference here, whatever is easiest for you. I'd probably lean towards smaller PRs, but only as small as is meaningful & functional.

mattsaxon commented 5 years ago

Latest PR submitted here https://github.com/beveradb/pysonofflan/pull/48