djtimca / haomnilogic

Hayward Omnilogic integration for Home Assistant available through HACS
Apache License 2.0
18 stars 6 forks source link

Frequent timeouts cause Entities to go unavailable for a short time. #15

Closed djtimca closed 2 years ago

djtimca commented 2 years ago

The Hayward API is timing out frequently causing entities to go unavailable for a short time and potentially affecting automations that rely on sensor (or other entity) data.

Need to find a way to keep the last values in the case of a timeout rather than going unavailable.

MHillyer commented 2 years ago

Bonus: If the local representation of the switches could update prior to the next polling it would be great, even if they have to correct later on. I'm putting instructions in my dashboard to tap a switch then wait instead of tapping again, because the switch state doesn't update until after the polling.

sddgit commented 2 years ago

@djtimca So glad you opened this - I was about to open something similar. What is the timeout value in the calls? Could it stand to be longer? Bit of a worry there are so many timeouts.

@MHillyer Hmmm, I would have thought a switch toggle just went through to Hayward immediately, rather than waiting for a poll cycle. But maybe the API is slow to respond to these as well.

djtimca commented 2 years ago

@sddgit timeout is set to 20 seconds currently, but I'm likely going to move it back to 10 seconds (since there was no meaningful change in performance either way). Seems that if the Omnilogic API doesn't respond basically immediately, it won't respond.

@MHillyer I'm not sure why you are seeing the switch not holding state. The current version assumes the new state when you change a light or switch for a period (switches are 30 seconds) to allow the Hayward telemetry to catch up. There are significant delays in their telemetry, if you're finding that 30 seconds isn't enough, maybe I can extend that time.

sddgit commented 2 years ago

Thanks for all the changes you’re implementing to get around the API “oddities”. Out of interest, does the API have any sort of rate limiting on it?

djtimca commented 2 years ago

It isn't supposed to. When I checked with them a couple years ago the API was supposed to handle 10+ calls per second.

sddgit commented 2 years ago

From each client?

djtimca commented 2 years ago

Yep - should be able to hammer it. I suppose a test could be to reduce polling interval to see if you get less timeouts.

sddgit commented 2 years ago

You mean to make the polling interval more than 6 seconds so it polls less often (as a test)?

sddgit commented 2 years ago

I’m still seeing lots of timeouts in the log with 1.0.12 (which may be expected). But I’m also seeing a lot of another error:

2022-06-13 11:16:37 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:20:18 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:22:58 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:26:42 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:32:30 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:34:04 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:35:11 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:38:54 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error converting Hayward data to JSON.
2022-06-13 11:49:39 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:51:35 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 11:59:14 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:03:13 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:04:16 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error converting Hayward data to JSON.
2022-06-13 12:05:03 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:07:35 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:08:03 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error converting Hayward data to JSON.
2022-06-13 12:08:49 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error loading Hayward data.
2022-06-13 12:09:06 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error converting Hayward data to JSON.
2022-06-13 12:11:27 ERROR (MainThread) [custom_components.omnilogic.common] Error fetching Omnilogic data: Error updating from OmniLogic: Error converting Hayward data to JSON.
2022-06-13 12:12:19 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:15:23 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:18:41 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:20:30 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:21:39 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:22:51 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:24:13 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:27:04 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:33:07 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:34:25 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:37:19 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:38:54 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data
2022-06-13 12:40:20 ERROR (MainThread) [custom_components.omnilogic.common] Timeout fetching Omnilogic data

Also, when either error occurs, the logbook shows equipment going unavailable: 42F0C1B5-2B71-4382-905C-C783B053B661

djtimca commented 2 years ago

The errors are expected periodically and apparently Home Assistant does want the entities to go unavailable when it raises an UpdateFailed so I can't get around that.

I have set the default polling interval to 30 seconds in today's release and extended the timeout to 30 seconds as well. Hoping that this will drastically reduce the number of errors in the logs, but will keep an eye on it.

sddgit commented 2 years ago

I have absolutely no idea about how integrations are coded. But does that mean, if there’s a timeout, you can’t just completely ignore it, or force the last state on the sensors and switches? I’m still concerned that even a simple schedule automation that turns on a pump will fail if that switch is unavailable.

djtimca commented 2 years ago

@sddgit correct. The underlying Home Assistant architecture determines how timeouts or failed cloud polling updates are handled and their architecture decision has been to have things go unavailable.

The service call to the pump should happen regardless of the polling status since the turn_on command doesn't validate availability before trying to send the serviced call. That means that even if your service call happens when you have had a timeout the pump should still turn on. In 3 years I have never had an issue with that happening. The only situation which could cause an issue would be an automation which depends on a condition on one of the sensors from Omnilogic which may be unavailable and which then could fail in that rare case.

With the latest update I had no timeouts or issues over 24 hours (except when my internet dropped and all my cloud integrations timed out). I think this one is solved.

sddgit commented 2 years ago

Thanks for the clarification. I’m also not seeing any problems using 1.0.13. What do you put that down to mostly? The change to polling interval or polling timeout?

djtimca commented 2 years ago

@sddgit probably combination of both but feel free to play with the polling interval in configuration.