TheAgentK / tuya-mqtt

Nodejs-Script to combine tuyaapi and openhab via mqtt
MIT License
173 stars 81 forks source link

tuyq-mqtt hangs ... #54

Closed thomasleitner closed 3 years ago

thomasleitner commented 3 years ago

After running tuya-mqtt for about 2 weeks, I've noticed that tuya-mqtt published "offline" to the "status" topic even though the "node tuya-mqtt.js" process was running. The last message in the logfile (stdout) is:

(node:2422) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an
async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 126)
(node:2422) UnhandledPromiseRejectionWarning: Error: No connection has been made to the device.
    at TuyaDevice._send (/home/fhem/tuya-mqtt/node_modules/tuyapi/index.js:269:13)
    at Promise (/home/fhem/tuya-mqtt/node_modules/tuyapi/index.js:242:14)
    at new Promise (<anonymous>)
    at _setQueue.add (/home/fhem/tuya-mqtt/node_modules/tuyapi/index.js:238:46)
    at run (/home/fhem/tuya-mqtt/node_modules/p-queue/dist/index.js:153:104)
    at PQueue._tryToStartAnother (/home/fhem/tuya-mqtt/node_modules/p-queue/dist/index.js:101:38)
    at Promise (/home/fhem/tuya-mqtt/node_modules/p-queue/dist/index.js:167:18)
    at new Promise (<anonymous>)
    at PQueue.add (/home/fhem/tuya-mqtt/node_modules/p-queue/dist/index.js:148:16)
    at TuyaDevice.set (/home/fhem/tuya-mqtt/node_modules/tuyapi/index.js:238:27)
(node:2422) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an
async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 129)

strace says this:

$ strace -p 2422
strace: Process 2422 attached
epoll_pwait(13, [], 1024, 2338, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 5814, NULL, 8) = 0
write(18, "\300\0", 2)                  = 2
epoll_ctl(13, EPOLL_CTL_MOD, 18, {EPOLLIN, {u32=18, u64=18}}) = 0
epoll_pwait(13, [{EPOLLIN, {u32=18, u64=18}}], 1024, 4184, NULL, 8) = 1
read(18, "\320\0", 65536)               = 2
epoll_pwait(13, [], 1024, 4181, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, [], 1024, 10000, NULL, 8) = 0
epoll_pwait(13, ^Cstrace: Process 2422 detached
 <detached ...>

After restarting the "node tuya-mqtt.js" process, everything is good again.

tsightler commented 3 years ago

Can you share the full log? The last message isn't very useful since it clearly shows that TuyAPI did not have a connection to the device (notice that tuya-mqtt is not even mentioned in the stack trace, only TuyAPI), so tuya-mqtt reporting the status as "offline" for the device is expected. There should be log messages regarding this disconnection and attempts to reconnect in the tuya-mqtt DEBUG output (assuming you were running with debug). I've tested the reconnect logic pretty extensively, but there could certainly be a bug hiding in there somewhere, or some use case that's not covered.

My guess is the device received some type of command while it was offline that triggered the message above,, because I currently don't trap sets when the device is in offline state and I probably should. However, that's more of a symptom vs cause thing as I wouldn't need to trap them if the device wasn't already offline for some reason so we need to find and determine the cause of the offline event. The tuya-mqtt code is completely dependent on TuyAPI for this, it monitors heartbeat and online/offline status based on TuyAPI so I'll need full logs to understand the chain of events.

thomasleitner commented 3 years ago

Sorry, the full log was overwritten after the restart. I'll restart it in DEBUG mode and wait until it happens again (might take quite some time) ... will post the full debug log then ...

tsightler commented 3 years ago

Thanks. I'll try to think about ways to reproduce the issue, but it's a tough one.

jnt2007 commented 3 years ago

Hi I have the same situation, but tuya-mqtt disconnects after few hours. I have reproduced with debug enabled and have a logs. Please check. v3.0.0

Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.338Z tuya-mqtt:state MQTT DPS18: tuya/smart_plug/dps/18/state ->  14
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.338Z tuya-mqtt:state MQTT DPS19: tuya/smart_plug/dps/19/state ->  17
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.338Z tuya-mqtt:state MQTT DPS20: tuya/smart_plug/dps/20/state ->  2368
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.338Z tuya-mqtt:tuyapi Received JSON data from device bf3b11111111111111t3er -> {"18":16,"19":20}
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.339Z tuya-mqtt:state MQTT DPS JSON: tuya/smart_plug/dps/state ->  {"18":16,"19":20}
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.339Z tuya-mqtt:state MQTT DPS18: tuya/smart_plug/dps/18/state ->  16
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.339Z tuya-mqtt:state MQTT DPS19: tuya/smart_plug/dps/19/state ->  20
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.339Z tuya-mqtt:error Device id bf3b11111111111111t3er has missed 1 heartbeat
Jan 04 12:47:39 notebook.local node[24731]: 2021-01-04T10:47:39.396Z tuya-mqtt:tuyapi Disconnected from device smart_plug (bf3b11111111111111t3er, bafd22222222229f)
tsightler commented 3 years ago

I actually think the original error in this issue was probably fixed by an update in TuyAPI (the API library on which tuya-mqtt is built) which was released very recently (just a couple of days ago). I'm not sure that the behavior you are seeing is the same as I don't see any actual code errors, just a notification that a heartbeat was missed and that the device was disconnected. This message came from TuyAPI, not from tuya-mqtt itself and, currently, there's no built in recovery from the scenario where the device itself closes the connection.

It is a little suspicious to me that the heartbeat fail and the disconnect happen basically simultaneously, but it's clear that it not tuya-mqtt itself that is performing the disconnect as it would have logged such an event. Rather, it logs the loss of a single heartbeat and then immediately logs that tuyapi was disconnected. That implies that the device closed the socket connection. Probably the only way to "fix" this is to add some extra logic in the disconnect code to initiate a reconnect.

I have no idea why the device itself would close the connection, I've never seen that happen with any of my devices and I drive about a half dozen devices with this code with basically no issues at all, but of course there are a huge variety of devices out there, and there's quite a lot of variation in how they behave. I still have a number of updates I need to do so I'll try to code up at least something that attempts to reconnect if the device disconnects.

jnt2007 commented 3 years ago

For now I found workaround. Actually it is not clear, but works. If you have better solution I will be very grateful.

diff --git a/devices/tuya-device.js b/devices/tuya-device.js
index c3a9d93..5c64ab6 100644
--- a/devices/tuya-device.js
+++ b/devices/tuya-device.js
@@ -606,14 +606,11 @@ class TuyaDevice {
     monitorHeartbeat() {
         setInterval(async () => {
             if (this.connected) {
-                if (this.heartbeatsMissed > 3) {
+                if (this.heartbeatsMissed > 0) {
                     debugError('Device id '+this.options.id+' not responding to heartbeats...disconnecting')
                     this.device.disconnect()
                     await utils.sleep(1)
                     this.connectDevice()
-                } else if (this.heartbeatsMissed > 0) {
-                    const errMessage = this.heartbeatsMissed > 1 ? " consecutive heartbeats" : " heartbeat"
-                    debugError('Device id '+this.options.id+' has missed '+this.heartbeatsMissed+errMessage)                
                 }
                 this.heartbeatsMissed++
             }
tsightler commented 3 years ago

Thanks for that, it at least tells me that a reconnect should work. I had previously decided not to retry on disconnect since, theoretically, in the future we might want it to be possible to stop/disconnect from a device manually, but there's currently no method to do that in the code, and based on the fact that I have limited time to work on this project, I don't see adding any dynamic capabilities in the near future, so I'll probably just change the disconnect callback to automatically try a reconnect after a few seconds, which is basically what your change does as well.

tsightler commented 3 years ago

I pushed 3.0.2 which updates the TuyAPI dependency to the latest version that includes a fix I believe will address the initial issue reported here. It also implements reconnect on device disconnect. Feedback on this update would be appreciated since I can't easily reproduce either of these issues.

jnt2007 commented 3 years ago

Hi. My server working well more than 4 days. If device disconnected tuya-mqtt trying to reconnect to the device every 10 seconds. Shortly device connection established well and continue working. Thank you for release 3.0.2!

tsightler commented 3 years ago

I found at least one other case were a socket error would not trigger a reconnect which I think I've addressed in the 3.0.3 release. I think, ideally, I need to refactor the reconnect logic which I'll keep in mind for the next maintenance release, but hopefully this will address the bulk of disconnect issues to this point. I'm going to close this for now but feel free to reopen or open a new issue if failure still exist in the future.