andreypopov / node-red-contrib-zigbee2mqtt

Zigbee2mqtt connectivity nodes for node-red
GNU General Public License v3.0
94 stars 24 forks source link

[Bug] sometimes getting devices gets stuck in a loop #107

Open Cyberbeni opened 1 year ago

Cyberbeni commented 1 year ago

I have recently switched to using docker and noticed that sometimes when pressing refresh device list, it gets stuck in a loop, mqtt constantly logging this:

2023-01-22T22:09:58.465552976Z 1674425398: New connection from 127.0.0.1:56860 on port 1883.
2023-01-22T22:09:58.465999278Z 1674425398: Client NodeRed-6b36da507834d3cf-tmp already connected, closing old connection.
2023-01-22T22:09:58.466041686Z 1674425398: New client connected from 127.0.0.1:56860 as NodeRed-6b36da507834d3cf-tmp (p2, c1, k60).
2023-01-22T22:09:59.472572673Z 1674425399: New connection from 127.0.0.1:56870 on port 1883.
2023-01-22T22:09:59.472830324Z 1674425399: Client NodeRed-6b36da507834d3cf-tmp already connected, closing old connection.
2023-01-22T22:09:59.472875851Z 1674425399: New client connected from 127.0.0.1:56870 as NodeRed-6b36da507834d3cf-tmp (p2, c1, k60).
2023-01-22T22:10:00.477631617Z 1674425400: New connection from 127.0.0.1:56884 on port 1883.
2023-01-22T22:10:00.477683910Z 1674425400: Client NodeRed-6b36da507834d3cf-tmp already connected, closing old connection.
2023-01-22T22:10:00.477784492Z 1674425400: New client connected from 127.0.0.1:56884 as NodeRed-6b36da507834d3cf-tmp (p2, c1, k60).

And then of course nodeRED debug tab will also start displaying getting device list timed out messages. Restarting the nodeRED or MQTT container didn't seem to have any impact but then restarting the zigbee2mqtt container did.

I don't know what triggers it because other times refresh device list works fine.

Cyberbeni commented 1 year ago

Based on the code I think this is caused by this: https://github.com/andreypopov/node-red-contrib-zigbee2mqtt/blob/96cff0fad61a50e718bd3fa56e808ab4241d9ea7/nodes/server.js#L814

I don't think we need to constantly recreate these "tmp" MQTT clients and instead have separate ones for getDevices and networkmap that are only recreated when the previous one ended. What do you think @andreypopov ?

Meanwhile I also noticed #82 , sorry for the duplicate issue.

Cyberbeni commented 1 year ago

Forgot to mention that both Zigbee2MQTT's web UI and controlling the devices through homebridge-z2m still worked when this happened.

Cyberbeni commented 1 year ago

When we call getDevices, we reset node.groups but if we get "/bridge/devices" before getting "/bridge/groups", I think it will stay empty because we end the client.

Cyberbeni commented 1 year ago

Based on the commit history, when the line I linked above was added, getting the devices still required querying it. But now since it is retained, there is no need to have a separate getDevices.

Cyberbeni commented 1 year ago

I have started testing this branch: https://github.com/Cyberbeni/node-red-contrib-zigbee2mqtt/tree/get-devices-fix

Seems to be working so far. (Sadly I don't know how to reproduce the bug that I had before)

andreypopov commented 1 year ago

Hi, I have no idea, never happened with me. I can't fix if I'm not able to reproduce it

Feel free to make PR

Cyberbeni commented 1 year ago

I don't know if the PR will fix whatever caused this issue but there will be one. (no ETA yet, don't have much time to work on it)

For example I restored getting device states at start, so people don't need to check "retained" for them in Zigbee2MQTT. (Unless there is no property that supports get. Even if only "power_on_behavior" supports it, Z2M will still send back "state" and other useful stuff when getting just "power_on_behavior" but there is no reply for an invalid/empty get message)

https://github.com/Cyberbeni/node-red-contrib-zigbee2mqtt/blob/05db2d6cd0d3b8cdbd6642919be3767f35937b41/nodes/server.js#L714-L730