Taraman17 / hass-homee

a Home Assistant custom component to integrate the homee smart home platform
MIT License
17 stars 2 forks source link

Homee Integration stops working #103

Open mrx1968x opened 2 weeks ago

mrx1968x commented 2 weeks ago

I found a bug and I didn't find an appropriate log entry to provide here.

Version: 2.9.6 HA Version: 2024.9.1

The homee integration stops working almost always after approx. one day. I.e. the entities don't get updated anymore.

After a HA restart, everything works fine again.

Taraman17 commented 2 weeks ago

Hmm, without any log entry, that is hard to track down. if you come across anything labelled homee or pymee (the library for communicating with homee), let me know.

MrAnDrs commented 2 weeks ago

I had the same problem recently that all Homee entities "hung". I did not turn on the error logs in advance (as described in the help) so did not even check them.

But I guess the root cause was a interrupted wifi connection and the integration did not reconnect itself automatically (or not properly) anymore after the Wifi went up again. It exactly happend Sunday morning at 3am and at this time I am doing a weekly reboot of all my access points resulting in a short wifi outage.

HA is connected via LAN (so no connection breaking to my local network for HA) but the Homee cube is connected via Wifi and is approximately 1 minute not available. I am not sure If I rebooted HA completely or just the integration to make it work again.

Maybe @mrx1968x is your router rebooting or maybe another short wifi / network hickup is happening in that time period and can be reproduced in that case?

I will also try to reproduce that again having the logs enabled.

Taraman17 commented 2 weeks ago

Thanks for the info. Normally the integration should try to reconnect itself and my local testing showed, that it normally does. If this fails, something should show up in the normal error logs of HA, logged by "pymee"

mrx1968x commented 2 weeks ago

Thanks MrAnDrs and Taraman. I'll check for connectivity issues and will have a look at the logs. As soon as I have more info I'll let you know. BTW, thx for the quick response.

mrx1968x commented 2 weeks ago

As assumed by @MrAnDrs , it seems to be a connectivity issue which triggers the problem.

I didn't send the full logs as they might contain private information. But here are the summarized logs filtered by "pymee" using the HA UI:

Logger: pymee
Quelle: .local/lib/python3.12/site-packages/pymee/__init__.py:553
Erstmals aufgetreten: 23. September 2024 um **01:09:10** (**3 Vorkommnisse**)
Zuletzt protokolliert: 24. September 2024 um **19:36:04**

Homee pymee_Home Disconnected. Error: sent 1011 (internal error) keepalive ping timeout; no close frame received

Logger: pymee
Quelle: .local/lib/python3.12/site-packages/pymee/__init__.py:546
Erstmals aufgetreten: 23. September 2024 um **01:09:20** (1 Vorkommnisse)
Zuletzt protokolliert: 23. September 2024 um **01:09:20**

Homee pymee_Home Reconnected after 1 retries

I performed a test yesterday (September 24th@19:36:04). As I have a Ubiquiti Unifi Wifi System, I've used the Unifi Controller to block my homee. After unblocking, the homee entities haven't been available again, I did a homee integration restart and this made it work again

On September 23rd@01:09:20 the reconnect was obviously working. But one of the 3 events (3 Vorkommnisse in German) "keepalive ping timeout" between September 23rd@01:09:10 and September 24th@19:36:04 made it stop working as well and I had to restart.

Looks to me like a problem with the "connectivity check and reconnect" procedure in pymee.

Taraman17 commented 2 weeks ago

OK. so "unfortunately" in this case it worked and the log does not show the error from the other occasions. The log message of the disconnect shows me, that it worked as intended, since the underlying error was caught and correctly logged. So I assume, that on the other disconnects some other error is thrown, that is not caught. You might be able to find the other w occasions if you show the full log ("Vollständige Protokolle laden") and search for pymee therein.

p.s.: Since I'm German, I will have no problem understanding the logs. ;-)

mrx1968x commented 2 weeks ago

Thanks again for the quick response. I really appreciate it.

This is my "hand picked" excerpt from the full logs:

2024-09-23 01:09:10.829 WARNING (MainThread) [pymee] Homee pymee_Home Disconnected. Error: sent 1011 (internal error) keepalive ping timeout; no close frame received
2024-09-23 01:09:20.873 WARNING (MainThread) [pymee] Homee pymee_Home Reconnected after 1 retries
2024-09-23 03:34:05.046 WARNING (MainThread) [pymee] Homee pymee_Home Disconnected. Error: sent 1011 (internal error) keepalive ping timeout; no close frame received
2024-09-24 19:36:04.713 WARNING (MainThread) [pymee] Homee pymee_Home Disconnected. Error: sent 1011 (internal error) keepalive ping timeout; no close frame received
2024-09-24 19:38:43.126 INFO (MainThread) [custom_components.homee] Found node Lampe Treppe oben, with following Data: ...

So, in one of three cases a retry was performed and was succesful. That's why I assume it's a pymee problem.

Taraman17 commented 2 weeks ago

Hmmm, that makes it even more difficult. I just did the recent Homee Firmware update, which also causes a short network interruption and there everything worked as expected. If you set the logging level to "info" as described in the Readme, we might get more detailed information on what happens.

mrx1968x commented 2 weeks ago

Yesterday I performed an update to the latest homee firmware as well:

2024-09-25 19:38:35.127 WARNING (MainThread) [pymee] Homee pymee_Home Disconnected. Error: received 4000 (private use) SERVICE_RESTART; then sent 4000 (private use) SERVICE_RESTART 2024-09-25 19:39:55.167 WARNING (MainThread) [pymee] Homee pymee_Home Reconnected after 3 retries

Please note that the disconnect shows another Error Message in this case and pymee reconnected after 3 retries.

But after "Error: sent 1011 (internal error)..." pymee cannot recover reliably every time.

2024-09-24 19:36:04.713 WARNING (MainThread) [pymee] Homee pymee_Home Disconnected. Error: sent 1011 (internal error) keepalive ping timeout; no close frame received

@Taraman17 Do you think we have to open an issue for the pymee project for this behaviour as the complete mechanism is part of pymee? The extended logging is enabled now and I have to wait for the next disconnect now. ;-)

PS: The hardware solution would be to connect the homee via LAN instead. That might also improve the Zigbee reach as Zigbee and WiFi both operate in the 2.4 GHz band and might have some interference. I was even successful to improve Wifi performance of my FRITZ!Box by deactivation of USB (which is obviously introducing noise in the 2.4 GHz band as well).

Wesie commented 2 weeks ago

I'm still on version 2.9.5 Only FYI: I haven't see any error with this verison. Also not after update the brain.

Taraman17 commented 2 weeks ago

I also noticed that the error message during the update was another one. But also one that was correctly cought and reconnect worked. A seperate issue for pymee is not needed at the moment, since I'm working on that as well and am in the process of taking over ownership of the library. If the logs will show, that the error originates there, I will file one.

For the moment we can just watch and see if we find something in the logs on a failed reconnect. Else I might try to add more logging in the future to get a hint.

Taraman17 commented 1 week ago

Now I got the error myself and found the log:

2024-09-29 04:24:41.680 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved (None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/pymee/__init__.py", line 151, in run
    await self.open_ws()
  File "/usr/local/lib/python3.12/site-packages/pymee/__init__.py", line 173, in open_ws
    async with websockets.connect(
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/client.py", line 635, in __aenter__
    traceback: Optional[TracebackType],
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/websockets/legacy/client.py", line 654, in __await_impl__
    await protocol.handshake(
                           ^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1122, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1104, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1007, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 641, in sock_connect
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 681, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
OSError: [Errno 113] Connect call failed ('192.168.170.131', 7681)

Cause is here, that the "OSError" is not cought and I have to add that in the code.

mrx1968x commented 1 week ago

I was still waiting for the next occurence with extended logs.

@Taraman17 Thanks! I'll be the first one to test it!

Taraman17 commented 1 week ago

Created: https://github.com/FreshlyBrewedCode/pymee/issues/37

It will however take a while till I can fix that, because I need to take over the pymee package first.

mrx1968x commented 1 week ago

Sure. I'm aware you're doing it in your spare time.