bob1de / hass-apps

Some useful apps and snippets to empower Home Assistant and AppDaemon even more.
Apache License 2.0
85 stars 23 forks source link

[schedy] Value toggling in loop with multiple actors and change replication #61

Closed tabnul closed 2 years ago

tabnul commented 3 years ago

Hi, this night one of my schedules went into a loop. Every 2 seconds all switches went on and off.

Apparently ' something ' switched it manually, after which schedy put it in the right state (OFF) I am unable to see what ' something ' is because my history log is unreachable / overrun. However, there is a delay set of 3 hours, so so schedy shouldnt touch the schedule anyway?

Log : https://pastebin.com/vGaKxByK

My app: https://pastebin.com/fEqizq3S

I have a possible suspect for the root cause, 3 days ago i upgraded has and appdaemon. Could be a coincidence but this setup has been stable for a long while!

Old version: Hass: 2021.3.4 Appdaemon: 4.0.5

New version: Hass: 2021.5.3 Appdaemon: 4.0.8 I downgraded now, becaue i dont trust the new versions in combination with Schedy!

tabnul commented 3 years ago

I did some additional investigation by querying the SQLite event DB. Both the 'turn on' and 'turn off' actions are performed by the API User ,which is used solely by schedy. Something must go wrong internally. Could it be that it is caused by the large amount of actors?

bob1de commented 3 years ago

Hi,

That looks weird indeed. While Schedy waits for all actors to report the last value change back, some switches still seem to report the previous value, probably because of some processing or transmission delays. Normally, this is not a problem because Schedy ignores state changes of actors that are currently in the progress of sending/waiting for a receipt.

Have you maybe disabled value resending for the actors (i.e. send_retries: 0)? If yes, that would explain the issue, because Schedy then doesn't wait for a receipt at all. Try at least one retry (send_retries: 1) and see if that helps.

Other than that, I have no idea so far, maybe a log with debug: true would reveal more details.

Best regards Robert

tabnul commented 3 years ago

Thanks for your reply. I didnt adjust the retry value, what is the default value? Under the hood, the actors are zigbee devices using zigbee2mqtt. There shouldnt be a delay but there are a few 'hops' between schedy and the actual device. But even if there would be a delay, why would schedy turn off devices that should be on as per the schedule?

bob1de commented 3 years ago

The default value is 10, so you should be fine in that regard.

I can't infer what is causing the initial value change from the short log snippet, but Schedy sees that some switch is responding with a value that wasn't set by Schedy, thus assuming the switch was toggled manually and then replicates this new value to all actors in the room.

Some seconds later, some other switch again responds with the previous value before it has reported the intended value change back and the whole thing starts again, now setting the other value.

You could try to disable change replication entirely by setting replicate_changes: false in the config of your room, but then a manual change at one of the switches will no longer propagate to the other ones. But at least that should break the loop.

tabnul commented 3 years ago

Disabling state replication sounds to me as a proper solution. I dont need or want that anyway for these actors! Thanks. Will set to debug anyway, and monitor.

Still have a mixed feeling about the new versions of appdaemon and homeassistant. A system that breaks 2 days after an upgrwde, after being stable for more than a year ;). But can be coincidence of course. Lets see !

bob1de commented 3 years ago

Ok, I'll nevertheless have another look into the code in the next days, maybe I find an explanation for the behavior you encountered.

tabnul commented 2 years ago

fixed with the mentioned workaround. something went wrong with state replication. probably in combination with zigbee2mqtt