Closed mattsaxon closed 4 years ago
Another trace for your consideration. This one is a startup trace with all the sonoff_land_mode entries.
This trace only has a single switch configured, so may be easier to follow. Again HA is hung, but for this problem I had to resort to a host reboot, not just a restart of HA.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.components.switch] Setting up switch.sonoff_lan_mode
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Initializing SonoffLANModeClient class in SonoffDevice
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch __init__ finished creating SonoffSwitch
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop is active on the event loop
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Starting loop waiting for device params to change
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop now awaiting event
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection is active on the event loop
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to connect()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Connecting to websocket address: ws://192.168.0.156:8081/
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain config took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain cloud took 1.3 seconds.
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch async_update called
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Sonoff device basic info still none, waiting for init message
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch returning _name: Master_Blanket_Door
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] HassSonoffSwitch returning _available: False
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded binary_sensor from homeassistant.components.binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setting up binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain binary_sensor took 0.0 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain sensor took 1.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain tts took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded cloud.binary_sensor from homeassistant.components.cloud.binary_sensor
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain switch took 0.6 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.components.binary_sensor] Setting up binary_sensor.cloud
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain automation took 0.5 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.loader] Loaded panel_custom from homeassistant.components.panel_custom
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.setup] Setup of domain hassio took 0.6 seconds.
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.bootstrap] Home Assistant initialized in 6.23s
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.core] Starting Home Assistant
2019-04-16 19:39:30 INFO (MainThread) [homeassistant.core] Timer:starting
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to send_online_message()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Sending user online message over websocket
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Received user online response:
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] {'error': 0, 'apikey': '[removed]', 'sequence': '15554435707193477', 'deviceid': '[removed]'}
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Received basic device info, storing in instance
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Websocket connected and accepted online user OK
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] setup_connection yielding to receive_message_loop()
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Waiting for messages on websocket
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] receive_message_loop finally block reached: closing websocket
2019-04-16 19:39:30 ERROR (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Connection closed unexpectedly
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] shutdown_event_loop called, setting keep_running to False
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] finally: closing websocket from setup_connection
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] Closing websocket from client close_connection
2019-04-16 19:39:30 DEBUG (MainThread) [homeassistant.components.switch.sonoff_lan_mode] send_updated_params_loop finally block reached: closing websocket
Ooft, thanks for the trace and attempts to narrow down the issue, and sorry to hear the component actually crashed your Home Assistant host - that's definitely not good!
I don't have capacity to look into this properly right now (e.g. checkout code and try to reproduce) I'm afraid, my evenings and weekends are consumed by lesson prep for work just now, but I hope to have a little more free time come May.
I'm always happy to answer questions and do code review / merge of any decent pull requests though!
I've made some progress on fixing this (in pysonofflan.py).
I'm testing a fix at the moment locally and if successful, will submit a PR is due course
Issue was a result of;
I'm very new to HA and this is my first foray into Python, so it might take me a while to tidy this up, but I though I'd post progress here.
For now I've fixed with a retry loop in the setup_connection() function.
One thing that I am still confused about is that in testing, if I unplug the device (Sonoff S26) there are no failures, it's only when I plug back in I get the failure. I'm surprised the websocket.send() method doesn't return an error, but I note that there is a change to the PONG code. @beveradb, can you shed any light on how ping/pong is implemented by Itead and why you found it necessary to override this from the standard websockets implementation?
Further update.
It looks to me as if the websockets version in HA is v6 and automatic ping/pong only came in with v7 of websockets (see https://websockets.readthedocs.io/en/stable/changelog.html).
@beveradb, I don't see any PONG debug messages when running inside HA, can I check if you've seen any and it is something to do with my specific setup (for example I'm running Hassio on a Pi)
I'll look to upgrade this tomorrow unless you think I'm barking up the wrong tree
My draft fix is within PySonOffLan
Here is the issue in that Repo which also linked to the PR https://github.com/beveradb/pysonofflan/issues/45
I have been testing this with websockets 6.0 at the moment, but will look to upgrade to 7 in due course.
Thanks for your efforts on this, it looks like you're making progress!
So to respond to a couple of your points:
websockets
version
Unfortunately, when I implemented pysonofflan
, I didn't think about the possibility of a dependency clash with other Home Assistant dependencies. As such, my implementation was based on the current version (7), was tested with that version, and is reliant on features such as the built-in ping/pong to be stable.
When installed and used on the command line, with Python 3.7, as a standalone tool for controlling the devices - it works fine, as the latest version of websockets is used. However, when used in Home Assistant as part of sonoff-lan-mode-homeassistant
, the version of websockets
is limited to version 6 as unfortunately there is another HA dependency which uses websockets
and explicitly specifies version 6.
I think this is a core issue, and something which it's worth doing further investigation around - e.g. identifying the other HA dependency which requires version 6, and trying to get that to update to use the newer version.
An alternative (if that proves impossible/challenging) could be to investigate rewriting pysonofflan
to use a different websocket library. For example, https://github.com/dpallot/asyncws could be a potential option - it seems a bit dated/inactive but appears to be used/recommended by Home Assistant here: https://developers.home-assistant.io/docs/en/external_api_websocket.html ?
"why you found it necessary to override the ping/pong from the standard websockets implementation"
Sure! In short, I believe the ping/pong working correctly is essential for the connection to the device to be kept open indefinitely, and it didn't seem to work correctly out of the box with websockets
version 7. I did a bit more investigation and found that the pong
response payload which the Sonoff device was responding with after receiving a ping
didn't contain the same payload as the ping
it was sent.
As per various resources on the protocol, the pong
payload should be identical to the ping
, so the websockets
library code follows the protocol correctly but the Sonoff device doesn't.
If you look through the websockets
library original read_data_frame
implementation, you see it is adhering to the websocket protocol correctly by comparing ping/pong payloads before clearing pings.
In my modified version of read_data_frame
, I've stripped out all the comparison logic so it's no longer technically adhering to the websocket protocol correctly, but it works with the Sonoff as it no longer cares what the payload of the pong
is.
Hope this helps somewhat, happy to answer any other questions as best as I can.
very new to HA and this is my first foray into Python
Me too! Blind leading the blind here then I guess 😄
Thanks Andrew, that all makes sense, I appreciate the detailed explanation.
Pity on the HA dependency on WS6. I'd stuck with 6 anyway for now as am making progress. The code won't be as tidy as it could be with ping/pong implemented though.
I've currently got a working version that caters for all failures I've come across. It needs some tidying up before I commit anything though. Hopefully will get through this in the coming week.
@beveradb, one further question on asyncio. I'm on the same learning curve and I wondered if we might simplify this by supporting less versions of Python. I note you are supporting 3.5, 3.6, 3.7 at the moment and that asyncio has made quite significant changes between these. I'm using 3.7 on the basis it is the latest and also the one the latest HA uses. Can we simplify our learning by just targeting one, or are there other users that need the earlier versions (e.g. has it been say integrated into OpenHAB or something else?)
simplify this by supporting less versions of Python
I'm all for simplification, and had the same frustration with asyncio
changing substantially between 3.5-3.7, but I'm keen to get this component into the upstream Home Assistant project at some point so we have to give that consideration.
As such, it has to support the same Python versions supported by HA itself - currently 3.5.3 and above. This is why I targeted those versions.
I'm using 3.7 on the basis it is the latest and also the one the latest HA uses
Not quite sure where you get that from - see above links, HA still supports 3.5.3 and above at present.
There are a few interesting/relevant points on this blog post from when HA deprecated support for Python 3.4 back in 2017 - mostly on the basis of allowing them to make it fully async.
It does seem like there's starting to be momentum around supporting only the latest 2 versions of Python (which would mean dropping support for 3.5) - see this recent discussion led by the Home Assistant founder/leader Paulus.
In that thread, he suggests:
that we deprecate Python 3.5 starting the first release after June 1, 2019
with mostly positive responses.
Based on that time frame, I'd personally be comfortable with both pysonofflan
and sonoff-lan-mode-homeassistant
both dropping support for 3.5.x right now, as it's unlikely to be at a decent enough quality for merge consideration before that date anyway.
So; long message, but - if you feel like rewriting things to make use of improvements in Python 3.6, and therefore dropping support for Python 3.5, go ahead! 😄
Thanks for all the info. My (incorrect) assertion on HA using 3.7 was based on my limited use, which has all been on Hassio to date.
I'll consider all you say whilst I'm refactoring my retry workings into something that is fit for review.
Do you have a preference for how I submit, for example would you like issues raised for each piece I find or shall I just add comments to PRs and we can review the code in the same way you looked at my initial PR?
If the PR is too large on first look, I'd be happy to go back and submit smaller PRs so we can incrementally improve (I would have liked to do this first time around, but I've been needing to learn Python and asyncio so my commits would have been very poor quality, hopefully I've learnt enough now that they will be reasonable!)
Sounds good, thanks!
No particular preference here, whatever is easiest for you. I'd probably lean towards smaller PRs, but only as small as is meaningful & functional.
Latest PR submitted here https://github.com/beveradb/pysonofflan/pull/48
HA sometime becomes unresponsive, the UI is no longer available, returning ERR_CONNECTION_REFUSED to the browser.
I have had this multiple times over the last few days since starting to use the component (I am on latest code as of today and it also occurred with previous versions)
Below is an abstract of the error which occurred on startup of HA, the switch was working faultlessly prior to restart.
To recover, I have restarted HA from within an SSH shell.
I believe this failure also occurs during standard use with the same effect, I will report back with more detail when this next occurs.