fsaris / home-assistant-awox

AwoX mesh light integration for Home Assistant
MIT License
84 stars 23 forks source link

No device connected in log #50

Closed fsaris closed 2 years ago

fsaris commented 2 years ago

Ok so again I have no response from lights and "No device connected" in log. image image image image

Originally posted by @Dillton in https://github.com/fsaris/home-assistant-awox/issues/48#issuecomment-938600916

fsaris commented 2 years ago

@Dillton I pushed a update to improve the disconnect handling

It still will disconnect when the command failed. But also will try again when reconnected

Dillton commented 2 years ago

Hello @fsaris, thank you. Its 2 days since I made the update and so far so good. There are still some errors in logs (some different than before) but at least the lights will not get unresponsive forever and so far there was no need to reload integration or restart HA. Errors: image image image image image image

Dillton commented 2 years ago

Hi @fsaris, I am sorry I have to take back what I posted yeasterday. Today I came back home and again all light disconnected. image image image

fsaris commented 2 years ago

But didn't the lights work anymore? Because HA should try to reconnect when a new command is triggered (trying to turn on the light for instance).

Do you also have the awox app connected to the lights? That also makes the connection unstable.

Dillton commented 2 years ago

The lights didn't turn on. I can confirm that when any light is disconnected I can see in HA its unavailable but it will become available in few seconds again. However this time it looked like available but no command from HA worked.

Yes I have awox app connected. When I think about this then it actually makes sense that there can be some colision between my phone BT adapter and HA BT adapter. I will try to disconnect the app and wait for result.

Dillton commented 2 years ago

Hello. So even with awox app disconnected the lights get unresponsive. image image image

Strange thing is that in the last error I can see MAC address and name of my Sony TV (KDL-43W755C). What is that supose to mean?

fsaris commented 2 years ago

these last errors are from the rssi check. A scan is done to get all available devices and after that a match on mac is done. So it could be that in the first step also your TV is seen.

Looks like that this scan fails for you. This scan is done once every 24 hours.

Is this every time the moment that the integration fails for you?

I pushed a update to adjust the logging a little and the connection is re-connected every 2 hours to keep the mesh connection active.

The awox_mesh.* entity now holds also some more info about the current active connection.

Dillton commented 2 years ago

Thank you for the update. I think I understand however cant confirm if this is the exact moment the integration fails. Will keep an eye on this. Update is done so lets test it :), will come back to you with result.

Dillton commented 2 years ago

Good evening. Just got to the point where lights aren't responding. In logs there is only one error now: image However last logged time is about 20 hours old and lights were working atleast until 3PM.

fsaris commented 2 years ago

Could you check the full log with logging level set to info for awox? https://github.com/fsaris/home-assistant-awox#troubleshooting

fsaris commented 2 years ago

That should give some more info

Dillton commented 2 years ago

Hi. I just collected logs from the night. Bcause its a huge amount of text you can find it here: https://drive.google.com/file/d/11i4ESYHWkejc7UtS2yb1_LwpwkQKPN86/view?usp=sharing image

UPDATE: I just get home and lights not responding. I have updated the log I shared in previous post. The logs from awox are missing since 2:57AM so I guess its unresponsive since then. Also the log below shows whats happening when I manually reload the integration. https://drive.google.com/file/d/1zrPxjeqb9Q0iuZMYfk2IfPH4_bRQfu82/view?usp=sharing

UPDATE: Not even hour since integration reaload and again no response. Second log file updated. image

Dillton commented 2 years ago

Hello @fsaris. Did you have time to check the logs? Meantime I tried the most basic thing. I have unplugged the eglo plug plus (bridge) completely and guess what, the light did not get unresponsive since then. So there is a question are you able to modify the integration for the plug plus based on my logs? As I said before I am completely open for "beta" testing and providing any needed data :)

fsaris commented 2 years ago

Will give it a try to add a switch entity. Lets see if it reacts to the same on/off commands as a light does

fsaris commented 2 years ago

@Dillton could you try the latest main version? Did an attempt to add plug on/off support (using Bluetooth). That would already increase the mesh connections you can use.

Dillton commented 2 years ago

@fsaris Update done, plug detected and switching on/off is working :) image image

Also I enabled debug logs in case of further issues.

Dillton commented 2 years ago

Hello @fsaris. Thanks for last update. I have to say so far so good. There are still some errors in log I will share below but not sure if this can be any more improved but the light sence last update did not get unresponsive.

image https://drive.google.com/file/d/1PKLNIIoaF_FZKucGDqRGOsRYW3QIF2zs/view?usp=sharing

fsaris commented 2 years ago

The warnings you get are "except-able" as this is due to the fact that the Bluetooth connection doesn't hold for that long. It will automatically try to reconnect and perform the command when connected again with up to 3 retries.

The Command failed, device is disconnected error is from some inner code and is correctly handled by the integration (and there handled as a warning).

The Error fetching awox data: No device connected is also no real issue when this happens from time to time. That's the check that runs every 30 seconds to ask all mesh devices to report there state. It will retry again next run.

Saw in you logging that the scan action performed by the integration to get the RSSI value of each reachable device doesn't give any result. That means it doesn't know what device is the nearest. As a result it could be that when trying to connect to the mesh it tries first a device that's the furthest away and will fail and then tries the next in the list.

I increased the scanning time a little maybe that's enough to get some RSSI values. That could improve the re-connect time.

Also improved the logging a little again to see these RSSI values etc

Dillton commented 2 years ago

Hello @fsaris. Sorry for no reply I was a bit busy. However I have done update yesterday and have some more logs for you because lights got unresponsive. image

Also I noticed you added the mesh as another device with its own entities. Below are current values (corresponds with time of this post) from that: image

Log: https://drive.google.com/file/d/1_Y3BQInGmh7cNhPb8qY1x1ZI4TGb7KCx/view?usp=sharing

fsaris commented 2 years ago

@Dillton did the lights get un-responsive around midnight? From the logs I can read that from around 23:54 it had more trouble to connect to one of the devices (lights). It succeeded just before midnight and than failed again for 20minutes until 0:20. From then it could connect again.

Some more observations:

2021-11-05 17:35:03 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:05:1d][Dining Room Light] Connected
2021-11-05 17:43:18 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 17:44:31 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 19:06:28 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 19:16:49 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 19:27:44 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 19:36:41 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 19:47:27 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 19:51:15 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 19:57:08 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 19:58:21 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 19:59:33 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 21:59:40 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 22:16:01 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 22:16:45 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 22:19:28 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 23:43:50 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a1:05:f0][Living Room Light] Connected
2021-11-05 23:45:10 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 23:46:58 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 23:47:41 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 23:49:03 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-05 23:52:16 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a5:06:20][Living Room Entry Light] Connected
2021-11-05 23:57:40 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a3:04:95][Office Table Light] Connected
2021-11-05 23:59:40 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a3:04:95][Office Table Light] Connected
2021-11-06 00:00:43 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a3:04:95][Office Table Light] Connected
2021-11-06 00:20:43 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-06 00:22:08 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected
2021-11-06 00:25:49 INFO (MainThread) [custom_components.awox.awox_mesh] [a4:c1:38:a2:04:96][Bedroom Table Light] Connected

Are some Bluetooth devices moving around during that time? Or maybe some other devices turned on that give some interference.

Was this the log unit it got completely unresponsive? Can not fully grep the "15hours ago" from the "last connection" sensor. In the logs I see a lot of new connections being made.

Dillton commented 2 years ago

@fsaris I can confirm the time around midnight since when the light were unresponsive. However I was on my way to bed so I did not want to waste any time to solve that connection at the time. Then since the morning to afternoon I was not home so I collected logs when I got back home and posted to you. Until my post the light were unresponsive and then I had to restart whole HA to get it back working. I dont know about any other BT devices except my phone, TV, watches and two instances of raspberrys where I have my control panel for HA so still the same devices.

At the time from 19 to 20 I was playing around with HA so maybe thats the reason of more instability.

Now little bit of my observations: I think the debug logging loads the HA a bit more and whole system is more unstable. Since yesterday the debug log is disabled and lights are still working so far. Also there is no error in HA log according to Awox integration. Current status below: image

fsaris commented 2 years ago

Okay had a frozen integration myself just yet. Looks like the internal queue worker freezes together with a helper process that takes 100% of one of the CPU cores. A reload of the integration was enough to bring it alive and the CPU load to normal. Not sure yet why the connection isn't reset automatically

Dillton commented 2 years ago

Thank you for info. Last time I was focusing on total cpu usage now I will focus on the individual core usage to confirm we have similar behaviour.

fsaris commented 2 years ago

again a new version with some additional logging and some additional timeout checks

Dillton commented 2 years ago

Thank you for update its done. Debug log enabled so lets collect more logs :)

Dillton commented 2 years ago

Hello @fsaris. I have some new logs and some curious observation from approx last hour. So first since yesterday update lets say about 22:00 everything was working like a charm until today 20:00. Since today 20:00 there are new error logs related to Awox and it feels like the lights responding slower now (for example 10 seconds to get the light on/off). I think this slow response might be an attempt to restore connection with whole mesh. Am I right?

I have to highlight that I did not see any error/warning logs when I checked them today before 19:00 when I get home. So I will post logs only since that time.

Another thing. I am also focusing on the cpu usage as I said and I noticed there is a spiking to 100% of core load which holds for about 10 seconds then drops down and came back on another core. It is strange because on this PC is running only HA instance and Ubuntu instance. See screenshots below: This screen is exactly from the time I clicked to turn off light "Office Table Light" between 20:54 and 20:56 in HA logs. image

As far as I can see the core load spiking is related to the time of new error/warning I see in log. Here you can see core load with time in right down corner image

which perfectly corresponds to logs below in HA image

and I guess this cpu behaviour is fine at the end because its trying to reconnect and post the log message right?

HA Logs: https://drive.google.com/file/d/1kU2_zXeK5TZaOr4UC0fIsud3SyhfOcb5/view?usp=sharing image

Dillton commented 2 years ago

@fsaris UPDATE: So its midnight and in last few hours after my previous post the lights got unresponsive. I am sure the integration reload will solve it but before that I checked the cpu usage again and I have constant 100% load on one of the cores as you can see on the screen below: image

A few minutes later still 100% on one core: image

Immediatelly after integration reload core usage goes down: image

However I was wrong because now it is not enough to just reload it to make the lights work again. I had to restart whole HA OS to make it work so to my eyes it looks like that the frozen integration somehow freezes also the BT adapter.

I have updated the log file posted before where maybe you can find whats happening after the integration reload.

I hope we have now the similar behaviour and possibly found that the queue worker with helper process you mentioned yesterday are the core issues why the integration freezes.

fsaris commented 2 years ago

Had similar 2days back. What I notices was the the /usr/lib/bluetooth/bluetoothd process/service wasn't running anymore. Searched the /var/log/syslog files and noticed homeassistant kernel: [86964.416682] traps: bluetoothd[350] messages in there.

A reboot of the system made sure the service was running again (ps aux | grep blue). I also updated the host machine and homeassistant in the meantime so maybe it was an issue in one of the underlying programs/libs. It's running now for 1,5 day. So lets see how it goes on

fsaris commented 2 years ago

@Dillton if you look at the history of your AwoX mesh sensor. How long does keep connected to the same device? For me it reconnects around every 15 minutes

Dillton commented 2 years ago

Hello thank you for info. As for the history of mesh connection, most of the time its connected to "Living Room Entry Light" or "Eglo Plug" sometimes to "Living Room Light" which is almost the same distance as first two devices however simetimes it connects to "Dining Room Light, Bedroom Light or Office Table Light" which are the farthest devices from BT adapter.

Blue - Living Room Entry Light Orange - Eglo Plug Turquoise - Living Room Light Purple - Dining Room Light, Bedroom Light or Office Table Light image

NOTE: Dont look at today morning when it was unavailable. Whole HA was down because on the host PC was running some messed up service which I found is for some audio device I dont need there so its now disabled. Also November 9 there was an HA core update.

Dillton commented 2 years ago

Hi @fsaris. Just a quick update. Running now 3rd day without problem.

fsaris commented 2 years ago

🎉 but let's not jinx it 😉

Dillton commented 2 years ago

Hello @fsaris. I am not sure what exactly helped but running 9th day without any issue. 😉

There are already a few thousands of timeout errors in HA logs but so far no unresponsive lights xD

fsaris commented 2 years ago

Lights still responsive?

For me all is stable again so closing this for now

fsaris commented 2 years ago

Just tagged a new version with these fixes in them https://github.com/fsaris/home-assistant-awox/releases/tag/0.1.2

Dillton commented 2 years ago

Hello Frans.

I can confirm so far everything is stable and working for a month already I think. Thank you very much for your work and support 🙂

Have a nice christmas. Michal ZachaÅ™


Od: Frans Saris @.> Odesláno: středa 15. prosince 2021 20:32 Komu: fsaris/home-assistant-awox @.> Kopie: Dillton @.>; Mention @.> Předmět: Re: [fsaris/home-assistant-awox] No device connected in log (#50)

Lights still responsive?

For me all is stable again so closing this for now

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/fsaris/home-assistant-awox/issues/50#issuecomment-995131939, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQOLY4CIQ5SLZCOZJMDN3ATURDULHANCNFSM5GCET7SA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.