BazaJayGee66 / homeassistant_cololight

Custom component to support Cololight in Home-Assistant
MIT License
47 stars 6 forks source link

integration causing system to lag when cololight is unavailable #30

Closed goodlucknow closed 1 year ago

goodlucknow commented 1 year ago

Hi, First of all thanks for the work on this integration. I had previously flashed some cololights with WLED, but this integration along with support in hyperion has me back on the stock firmware to try it out :)

In the last week or so, I started to notice an intermittent lag with some of my automations, lights slow to respond etc. At first I thought it was a problem with my zigbee setup but eventually I realised that I still experience this intermittent lag using the UI to control non-zigbee devices also. So today I disabled all my integrations and re-enabled one by one to find the culprit, and it appears to be cololight... And it does seem the issue started not long after installing the integration.

I can only recreate the issue when cololight integration is enabled, and the cololight is powered off(status "unavailable").

That would suggest to me that something in the process of trying to reconnect to an unavailable cololight intermittently is causing the system to lag?

No errors or warnings are being thrown in the logs as far as I can see.

Any idea how to go about troubleshooting this one?

BazaJayGee66 commented 1 year ago

Interesting find, and thanks for reaching out.

Are you able to confirm the version of Home Assistant, and this custom integration you are running? Polling was added in v1.3.0, where HA will poll every 60 seconds to get the state of the device.

How much of a lag do you see, when it is an issue?

Do you have a process I can follow to try and reproduce the issue?

~If you're familiar with python and coding A timeout of 4 seconds is given, before raising an UnavailableException. This exception is then caught in HA, where the device is then set as "unavailable". This custom integration for cololight is all asynchronous, including when polling for an update, so I'm surprised it could have an effect with the rest of HA (not to say that it couldn't).

goodlucknow commented 1 year ago

Thanks for the response. I'm on the latest HA 2023.3.1 and v2.0.0 of the cololight integration. I'd consider myself an amateur/hobbyist coder, but I appreciate the explanation of the functions and that it is asynchronous. It seems very curious. The delay is never more than a few seconds, usually 1 or 2, sometimes more. I can't say for sure it's never been more than 4 seconds. My steps to reproduce this before reporting the issue were as follows:

  1. Enable the integration
  2. Disconnect the cololight from power source so that it is unavailable in the frontend.
  3. Manually turn an entity on and off repeatedly, and would observe the lag within a short time repeating the action. Initially tested using a zigbee light bulb but also repeatable by say, muting and unmuting a Chromecast speaker device.

I'm nearly certain it's the integration as when disabled I haven't been able to observe the same lag. Nor I believe when the cololight is powered on and connected.

I'm away from my home setup this weekend but back on Tuesday. If there's anything I can do to provide more information or perhaps test more rigorously please let me know.

BazaJayGee66 commented 1 year ago

Thanks for the details.

I've been able to replicate the issue, and it does appear it can produce lag when trying to get the status of an unavailable device.

The lag only occurs when interacting with HA, at the exact same time HA polls the device (when the device is "unavailable"). Turns out HA will poll every 30 seconds.

I haven't had the chance to debug, or understand what might be causing it yet. In the meantime, I've released v2.0.1 , which will enable you to disable polling for the device, via the integration system options. When polling is disabled, HA wont automatically poll the cololight device for updates.

image image
goodlucknow commented 1 year ago

Oh that's great, thanks for taking the time to release that. Makes the integration very usable for me in the meantime. Much appreciated. If I can be of any use to you with debugging, just shout.

goodlucknow commented 1 year ago

Just FYI, I think I may have been wrong in the requirement for the Cololight device to be offline for this lagging bug to occur. I began to observe again(cololight was online) when I was updating the integration before I disabled polling. With polling disabled now I have not noticed the lag since. Will update if I discover that it was caused by anything else. Thanks!

BazaJayGee66 commented 1 year ago

I've published a beta release v2.0.2-beta, which I'm hoping should resolve the lag issue.

Will test, and release if all OK.

goodlucknow commented 1 year ago

Excellent. I'm curious to know how you might have solved it! Away once again for the weekend but I'll try jump on the beta on Monday and let you know how it's looking here. Thanks for the attention to the issue!

BazaJayGee66 commented 1 year ago

I ended up reducing the timeout for the device down to 0.3 seconds. I wasn't able to find why HA would block other calls when updating, or if it can be done in the background asynchronously. I was also able to reproduce the same issues with other integrations that had set a high timeout.

goodlucknow commented 1 year ago

On the beta now and so far so good, haven't been able to reproduce any lag...

BazaJayGee66 commented 1 year ago

I ended up finding the root cause of the issue, explained in their docs

The only problem with running everything inside the event loop is when a task does blocking I/O; something most third-party Python libraries do. For example, while requesting new information from a device, the core will stop running until we get a response from the device.

I was able to suspend the update task, so that it wouldn't block HA and cause lag when timing out if the device was unreachable.

This issue should now be resolved with v2.0.2.

Thanks for all your help in testing, and raising the issue.