automicus / PyISY

Python module for interactive with the ISY-994 Insteon controller.
Other
16 stars 22 forks source link

v3.x.x and 2.x.x Gives Duplicate status update events with auto_update or websockets enabled #145

Closed NickWaterton closed 3 years ago

NickWaterton commented 3 years ago

This looks to be true of the 2.x version as well, but I'm on 3.x.x.

When auto_update is enabled, or use_websocket is enabled, you get two notifications for every command sent. This doesn't happens without auto_update enabled, because the node.status is not updated with the hint.

When you send a command eg node.turn_on(), and you are subscribed to status_events for that node, you get two changed events triggered.

This is because the turn_on() command updates the node.status with the hint value, which triggers a notify event from nodebase.py:

@status.setter
def status(self, value):
    """Set the current node state and notify listeners."""
    if self._status != value:
        self._status = value
        self._last_changed = now()
        self.status_events.notify(self.status_feedback)
    return self._status

Then you get a second trigger when the actual device changes to the new state, and you get an ST notification, which triggers a node.update_state

def update_state(self, state):
    """Update the various state properties when received."""
    if not isinstance(state, NodeProperty):
        _LOGGER.error("Could not update state values. Invalid type provided.")
        return
    changed = False
    self._last_update = now()

    if state.prec != self._prec:
        self._prec = state.prec
        changed = True

    if state.uom != self._uom and state.uom != "":
        self._uom = state.uom
        changed = True

    if state.formatted != self._formatted:
        self._formatted = state.formatted
        changed = True

    if state.value != self.status:
        self.status = state.value
        # Let Status setter throw event
        return

    if changed:
        self._last_changed = now()
        self.status_events.notify(self.status_feedback)

This triggers another notification event, because updating node.status does not update node._formatted, so update_state finds state.formatted != self._formatted is True and triggers another status_events.notify.

I believe the error in in node.py async def update which is supposed to update self._status not self.status.

ie:

    async def update(self, event=None, wait_time=0, hint=None, xmldoc=None):
        """Update the value of the node from the controller."""
        if not self.isy.auto_update and not xmldoc:
            await asyncio.sleep(wait_time)
            req_url = self.isy.conn.compile_url(
                [URL_NODES, self._id, METHOD_GET, PROP_STATUS]
            )
            xml = await self.isy.conn.request(req_url)
            try:
                xmldoc = minidom.parseString(xml)
            except XML_ERRORS:
                _LOGGER.error("%s: Nodes", XML_PARSE_ERROR)
                raise ISYResponseParseError(XML_PARSE_ERROR)
        elif hint is not None:
            # assume value was set correctly, auto update will correct errors
            self._status = hint
            _LOGGER.debug("ISY updated node: %s", self._id)
            return

        if xmldoc is None:
            _LOGGER.warning("ISY could not update node: %s", self._id)
            return

        self._last_update = now()
        state, aux_props = parse_xml_properties(xmldoc)
        self._aux_properties.update(aux_props)
        self.update_state(state)
        _LOGGER.debug("ISY updated node: %s", self._id)

Which works as expected.

NickWaterton commented 3 years ago

I have been having difficulties with this for a few days now. I have one light that gives errors quite often, and so have implemented a retry mechanism.

However, there is a problem. When an error message is received, like this:

<?xml version="1.0"?><Event seqnum="264" sid="uuid:116"><control>ERR</control><action>1</action><node>22 17 16 1</node><eventInfo></eventInfo></Event>

This triggers a status_event notification like this:

[2020-12-11 09:58:19,067][DEBUG](pyisy               ) CHANGE Notification Received {'address': '22 17 16 1', 'status': 0, 'last_changed': datetime.datetime(2020, 12, 11, 9, 58, 19, 67420), 'last_update': datetime.datetime(2020, 12, 11, 9, 58, 19, 67401)}

However the status value is incorrect. this is because when you send the node.turn_off() command, the "hint" mechanism updates the node._status value to 0, even though the status of the node has not actually changed.

So, if you send a turn_off() command, and there is an error, you get an error notification, which triggers a changed event, with status 0, indication that the light has turned off, when in fact what you are receiving is an error notification that the light did not respond. The same happens with turn_on() or any other command. You can see how this is a problem.

With the new Asyncronous library, I can actually see no use for the hint mechanism at all. Either the light turns on/off and the status gets updated when it responds, or it doesn't, in which case the status is correct not to change.

I have disabled the hint update mechanism in node, and everything now works as expected. The status reflect the actual ight status, and a retry mechanism is easy to implement, and does in fact work perfectly with my problem light that needs 2 or 3 retries before it responds.

Are there some circumstances where the hint mechanism is needed? say the light does actually change, but you get a communications error, so you assume the command worked? Just wondering what the hint mechanism is for.

I would suggest removing the hint mechanism in favor of real-time updates.

shbatm commented 3 years ago

Thanks for the input. Sorry I haven't had much time to dig in to this yet, but I hope to in the next few weeks.

I will play with turning off the hint again. I tried it on the original take of v2, but I hated that the toggles would 'bounce' in Home Assistant's Lovelace interface. Hopefully things are a bit quicker in V3, but I may also just try reworking the hint mechanics all together to provide an internal 'pending' status that doesn't trigger the events.

shbatm commented 3 years ago

Are there some circumstances where the hint mechanism is needed?

Expanding on this... The biggest reason I think that hint was added by @rmkraus originally would be interface lag. You click a toggle, it should go to ON, but the status hasn't updated back from the ISY so the interface moves it back to OFF, then the ISY updates the status so it turns to ON. This is the 'bounce' I mentioned above.

This does have its pitfalls for when the ISY doesn't actually do what you commanded and your interface is out of sync.

This would be easier if the ISY's response to the REST command told you if the end device actually responded to the command, then you could just watch for the response, and update accordingly. However an 200 OK response just tells you the ISY accepted the command and is processing it/sending it the devices.

This is where I think there's some room for a pending status/command timeout structure to take the place of the hint. Basically we'd assume the status changes on a successful command to the ISY (200 OK), but do a better job of watching for communication errors reported by the ISY, and revert the status if it failed.

NickWaterton commented 3 years ago

I actually implemented a retry mechanism. You send the command, and add it to a pending_commands dictionary. If the change state event comes in, the state is published and the command removed from the pending_command dict. If the state has not updated after 5 seconds, the command is resent. There are 5 retries before I give up.

Nick Waterton P.Eng Sent from my iPhone

On Dec 18, 2020, at 8:20 PM, shbatm notifications@github.com wrote:



Are there some circumstances where the hint mechanism is needed?

Expanding on this... The biggest reason I think that hint was added by @rmkraushttps://github.com/rmkraus originally would be interface lag. You click a toggle, it should go to ON, but the status hasn't updated back from the ISY so the interface moves it back to OFF, then the ISY updates the status so it turns to ON. This is the 'bounce' I mentioned above.

This does have its pitfalls for when the ISY doesn't actually do what you commanded and your interface is out of sync.

This would be easier if the ISY's response to the REST command told you if the end device actually responded to the command, then you could just watch for the response, and update accordingly. However an 200 OK response just tells you the ISY accepted the command and is processing it/sending it the devices.

This is where I think there's some room for a pending status/command timeout structure to take the place of the hint. Basically we'd assume the status changes, but do a better job of watching for communication errors reported by the ISY, and revert the status if it failed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/automicus/PyISY/issues/145#issuecomment-748396796, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYJIZ6UWVXMESXLJHONUNLSVP5S5ANCNFSM4UQWYYVQ.

Unsubscribe from GE's commercial electronic messages: http://sc.ge.com/*casl-unsubscribe Désabonner des messages électroniques commerciaux de GE: http://sc.ge.com/*lcap-desabonnement

shbatm commented 3 years ago

I'll take a look at your branch when I get time to play around.

An expanded command queue with retries for both REST call and comms errors is a good idea.

NickWaterton commented 3 years ago

I’ll post it somewhere, I don’t think it’s up yet, I just subclass pyisy.

Nick Waterton P.Eng Sent from my iPhone

On Dec 18, 2020, at 8:36 PM, shbatm notifications@github.com wrote:



I'll take a look at your branch when I get time to play around.

An expanded command queue with retries for both REST call and comms errors is a good idea.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/automicus/PyISY/issues/145#issuecomment-748399022, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYJIZ6WCILIG2X2MVF7K5TSVP7Q3ANCNFSM4UQWYYVQ.

Unsubscribe from GE's commercial electronic messages: http://sc.ge.com/*casl-unsubscribe Désabonner des messages électroniques commerciaux de GE: http://sc.ge.com/*lcap-desabonnement

NickWaterton commented 3 years ago

My implementation is here https://github.com/NickWaterton/ISY_MQTT/tree/main

As I say it just subclasses pyisy (but in my copy of pyisy the hint mechanism is disabled)

shbatm commented 3 years ago

@NickWaterton - I opened a new PR for the V3.x fix, which just removes the hint feature entirely and relies on the ISY sending an update in the event stream (or a manual node.update() call) to update the status of the node.

@bdraco - We talked about this a while back on Discord. Seems like a net zero effect in Home Assistant, Lovelace now has debounce on the UI toggles so everything seems to work fine without the hint. If a command fails from the ISY to the end device, the toggle will eventually (~3s) fall back to the old status (similar to my Zigbee2mqtt devices when they fail). Let me know if you think the subsequent communication error should be elevated to an exception to generate a toast message, right now it's just an error and an ERR attribute will appear in the aux_properties.

bdraco commented 3 years ago

If the action the user is performing fails, we should raise an exception to feed back to the UI.

If the action fails later we don't really have a way to feed it back to the user except logging.

I think its fine to remove the optimistic state change since the websocket is fast enough to deliver the event back to the state machine.

shbatm commented 3 years ago

XREF: shbatm/hacs-isy994#85 cc: @rccoleman

rccoleman commented 3 years ago

Thanks for the pointer to this discussion. It does seem like the retry mechanism that you're discussing would help with the issue that I'm seeing, but unrelated to the issue above. The typical case for me is that turning on a bunch of lights causes the ISY to sporadically respond with "404" (not found) errors via REST, resulting in the light not being turned on and the error mentioned in that issue. It's been happening forever with no indication that UDI will change the behavior, and it seems like an inappropriate error message to indicate that the Insteon network is "busy" (as UDI has diagnosed). In any case, I look forward to having commands retried if they fail.

shbatm commented 3 years ago

Closing with new issue #184 for adding enhancement for retry commands and #154 to fix this original issue.