Closed ercksen closed 1 year ago
Can you provide the herdsman debug logging of this? Run your process with DEBUG=zigbee-herdsman:*
Here is the log output. I marked the point at which the stop()
function of the controller is invoked. The lines before are from regular communication I think but I have attached some of the last lines before the stop was initiated. I forgot to mention that I am using Docker with a mounted device, if that makes any difference.
2022-07-06T17:39:09.679Z zigbee-herdsman:deconz:adapter checking coordinator simple descriptor
2022-07-06T17:39:09.681Z zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0x0 EP:0 SeqNr. 3 request id: 1
2022-07-06T17:39:09.682Z zigbee-herdsman:deconz:frameParser DATA_REQUEST RESPONSE - request id: 1 status: 0
2022-07-06T17:39:09.682Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:09.683Z zigbee-herdsman:deconz:frameParser DEVICE_STATE changed: 10101110
2022-07-06T17:39:09.683Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 1 apsDataIndication: 1 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:09.686Z zigbee-herdsman:deconz:driver query aps data confirm
2022-07-06T17:39:09.690Z zigbee-herdsman:deconz:driver DATA_CONFIRM - sending data state request - SeqNr. 4
2022-07-06T17:39:09.691Z zigbee-herdsman:deconz:driver query aps data indication
2022-07-06T17:39:09.692Z zigbee-herdsman:deconz:frameParser DATA_CONFIRM RESPONSE - destAddr: 0x0 request id: 1 confirm status: 0
2022-07-06T17:39:09.692Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 1 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:09.692Z zigbee-herdsman:deconz:driver query aps data indication
2022-07-06T17:39:09.695Z zigbee-herdsman:deconz:driver DATA_INDICATION - sending read data request - SeqNr. 5
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:frameParser DATA_INDICATION RESPONSE - seqNr. 5 srcAddr: 0x0 destAddr: 0x0 profile id: 0x0 cluster id: 0x8004 lqi: 17
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:frameParser response payload: 1,0,0,0,20,1,4,1,5,0,1,3,0,0,10,0,25,0,3,1,0,32,0,0,5
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:adapter resolve data request without a transSeq Nr.
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:adapter RECEIVING SIMPLE_DESCRIPTOR - addr: 0x0 EP:1 inClusters: 0,10,25 outClusters: 1,32,1280
2022-07-06T17:39:09.697Z zigbee-herdsman:deconz:adapter EP: 1
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter profile ID: 260
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter device ID: 5
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter input cluster: 0x0
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter input cluster: 0xa
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter input cluster: 0x19
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter output cluster: 0x1
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter output cluster: 0x20
2022-07-06T17:39:09.698Z zigbee-herdsman:deconz:adapter output cluster: 0x500
2022-07-06T17:39:09.701Z zigbee-herdsman:deconz:driver DATA_INDICATION - sending read data request - SeqNr. 6
<CONTROLLER STOP REQUEST>
2022-07-06T17:39:16.183Z zigbee-herdsman:controller:log Disable joining
2022-07-06T17:39:16.185Z zigbee-herdsman:deconz:adapter zclFrame to all - zclFrame.payload:
2022-07-06T17:39:16.186Z zigbee-herdsman:deconz:adapter { options: 10, commisioningWindow: 0 }
2022-07-06T17:39:16.186Z zigbee-herdsman:deconz:adapter sendZclFrameToAll - message send
2022-07-06T17:39:16.186Z zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0xfffd EP:242 SeqNr. 7 request id: 2
2022-07-06T17:39:16.188Z zigbee-herdsman:deconz:frameParser DATA_REQUEST RESPONSE - request id: 2 status: 0
2022-07-06T17:39:16.188Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.188Z zigbee-herdsman:deconz:frameParser DEVICE_STATE changed: 10100110
2022-07-06T17:39:16.188Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 1 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.192Z zigbee-herdsman:deconz:driver query aps data confirm
2022-07-06T17:39:16.197Z zigbee-herdsman:deconz:driver DATA_CONFIRM - sending data state request - SeqNr. 8
2022-07-06T17:39:16.198Z zigbee-herdsman:deconz:frameParser DATA_CONFIRM RESPONSE - destAddr: 0xfffd request id: 2 confirm status: 0
2022-07-06T17:39:16.198Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.202Z zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0xfffc EP:0 SeqNr. 9 request id: 3
2022-07-06T17:39:16.204Z zigbee-herdsman:deconz:frameParser DATA_REQUEST RESPONSE - request id: 3 status: 0
2022-07-06T17:39:16.204Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.242Z zigbee-herdsman:deconz:frameParser DEVICE_STATE changed: 10100110
2022-07-06T17:39:16.242Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 1 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.243Z zigbee-herdsman:deconz:driver query aps data confirm
2022-07-06T17:39:16.248Z zigbee-herdsman:deconz:driver DATA_CONFIRM - sending data state request - SeqNr. 10
2022-07-06T17:39:16.249Z zigbee-herdsman:deconz:frameParser DATA_CONFIRM RESPONSE - destAddr: 0xfffc request id: 3 confirm status: 0
2022-07-06T17:39:16.249Z zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1
2022-07-06T17:39:16.250Z zigbee-herdsman:deconz:adapter PERMIT_JOIN - 0 seconds
<stop() function returned without an error>
2022-07-06T17:39:16.256Z zigbee-herdsman:deconz:driver send write parameter request from queue. seqNr: 11 paramId: 33 param: 0
2022-07-06T17:39:26.257Z zigbee-herdsman:deconz:driver Timeout for request - CMD: 0xb seqNr: 11
Unhandled Rejection at: Promise Promise { <rejected> 'TIMEOUT' } reason: TIMEOUT stack: undefined
@ChrisHae since this is in the deconz adapter code, can you take a look at this?
I think I figured out why it crashes. In line 101 of src/adapter/deconz/adapter/deconzAdapter.ts
there is a missing await
for the drivers close
function. But close()
returns a promise that is not returned or caught. So if it rejects, Node will terminate. Of course that still would mean that close()
throws an error but that would probably be another topic. I will do some testing on the next days when I have more time and let you know.
Okay I did some investigation now and that was actually not the reason - although I see this as a problem and there should maybe an await
be added. But the reason the promise is rejected lays in deconz/driver/driver.js
. I don't know the architecture of this library well but it seems as there are no more rejection handlers after the controller stops.
I found a simple workaround (or fix?) by changing
while (i--) {
(line 394)
to while (i-- && this.initialized) {
This will stop the queue processing and error rejection. Additionally, now the loop terminates gracefully. If later calling start
again on the controller re-uses the old driver, it should rather be changed to
while (i--) { if (!this.initialized) return; } ...
But as I said before I don't know what is more suitable.
Also I found similar occurrences in lines 622 and 964 that look almost equally to me.
While reading the code I also saw some vague expressions that looked a little suspicious to me for unhandled rejections but they were not of importance here. If you are interested, I can have a look at it again and create an MR but idk how this is done on GitHub tbh.
EDIT: It must be the second variant. Otherwise it breaks communication with my devices. But still, after re-opening the controller, I run into problems. But that could also be a problem of my application.
Finally I got a better workaround (maybe this time we could even call it a fix).
The problem was that there were arrays that were used by a controller instance but declared module-wide so that they are shared even if a new controller is instantiated after stopping the old one. I just set the length of all arrays to 0 (deletes all entries without destroying references), plus, I put all setInterval() calls in an array and cleared the intervals when 'closed' is emitted. Currently, the intervals get orphans after the driver was closed, probably still consuming CPU time.
I think I figured out why it crashes. In line 101 of
src/adapter/deconz/adapter/deconzAdapter.ts
there is a missingawait
for the driversclose
function. Butclose()
returns a promise that is not returned or caught. So if it rejects, Node will terminate. Of course that still would mean thatclose()
throws an error but that would probably be another topic. I will do some testing on the next days when I have more time and let you know.
I've added the await, can you check if it is fixed now?
I already tested that yesterday and that was not the reason why it crashed (as I mentioned in https://github.com/Koenkk/zigbee-herdsman/issues/555#issuecomment-1179168263). But anyways, probably good that it is added now. In the comment linked were the steps explained that I did to make it working.
And finally I found a decent solution as explained in https://github.com/Koenkk/zigbee-herdsman/issues/555#issuecomment-1179269021. Seems like you have missed these somehow.
I see your solution but this still looks a bit hacky to me, can we clear these arrays on a stop in the adapter code?
Jup, this is exactly what I am doing:
constructor() {
// ...
this.on('close', () => {
this.intervals.forEach(i => clearInterval(i)); // intervals is the array were I push the
queue.length = 0; // results of setInterval() into
busyQueue.length = 0;
apsQueue.length = 0;
apsBusyQueue.length = 0;
apsConfirmIndQueue.length = 0;
timeoutCounter = 0;
});
}
Or are you concerned about the fact that it is hooked into the event emitter? We could also put this directly inside the stop function. But the close
event is called from three different locations inside the driver. I thought it would be better to catch all driver close
s to ensure that all resources are freed.
This looks good to me, could you make a pr?
I wanted to push the commit to a new branch but got a 403. Should I fork the project instead and create a PR then? I am coming from Gitlab and actually don't know how to create the PR properly.
Deconz driver: Clear intervals and shared arrays after close; catch all unhandled promises and forward errors to debugger
If it's easier, I zipped the file (.ts not supported) instead.
You can click the edit icon here: https://github.com/Koenkk/zigbee-herdsman/blob/master/src/adapter/deconz/driver/driver.ts (pencil) then make a pr
Wow! Didn't know of that. Thanks.
Thanks for the mr, I assume this can be closed now.
Hi,
The problem recurs when using socket port instead of serial port.
The socket was destroyed before clearing intervals ??
Here is the full debug log: {"path":"tcp://192.168.1.40:2002","adapter":"deconz"}
////// CALL await controller.start()
zigbee-herdsman:deconz:driver Set delay to 0 +0ms
zigbee-herdsman:controller:log Starting with options '{"network":{"networkKeyDistribute":false,"networkKey":[*********],"panID":*******,"extendedPanID":[********],"channelList":[11]},"serialPort":{"path":"tcp://192.168.1.40:2002","adapter":"deconz"},.................' +0ms
zigbee-herdsman:deconz:driver Opening TCP socket with 192.168.1.40:2002 +2ms
zigbee-herdsman:deconz:driver Socket connected +336ms
zigbee-herdsman:deconz:driver Socket ready +0ms
zigbee-herdsman:controller:log Started with result 'resumed' +338ms
zigbee-herdsman:controller:log Injected database: true, adapter: true +0ms
zigbee-herdsman:deconz:driver send read parameter request from queue. seqNr: 1 paramId: 1 +4ms
zigbee-herdsman:deconz:frameParser read firmware version response - version: 50,110,101,116 +0ms
zigbee-herdsman:deconz:frameParser MAC: 0x00212effff0413c3 +1ms
zigbee-herdsman:deconz:driver send read parameter request from queue. seqNr: 2 paramId: 7 +31ms
zigbee-herdsman:deconz:frameParser NWKADDR: 0 +26ms
resumed
zigbee-herdsman:controller:log Disable joining +2s
zigbee-herdsman:deconz:adapter zclFrame to all - zclFrame.payload: +0ms
zigbee-herdsman:deconz:adapter { options: 10, commisioningWindow: 0 } +0ms
zigbee-herdsman:deconz:adapter sendZclFrameToAll - message send +1ms
zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0xfffd EP:242 SeqNr. 3 request id: 1 +2s
zigbee-herdsman:deconz:frameParser DATA_REQUEST RESPONSE - request id: 1 status: 0 +2s
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +19ms
zigbee-herdsman:deconz:frameParser DEVICE_STATE changed: 10100110 +3ms
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 1 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +3ms
zigbee-herdsman:deconz:driver query aps data confirm +8ms
zigbee-herdsman:deconz:driver DATA_CONFIRM - sending data state request - SeqNr. 4 +15ms
zigbee-herdsman:deconz:frameParser DATA_CONFIRM RESPONSE - destAddr: 0xfffd request id: 1 confirm status: 0 +40ms
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +17ms
zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0xfffc EP:0 SeqNr. 5 request id: 2 +15ms
zigbee-herdsman:deconz:frameParser DATA_REQUEST RESPONSE - request id: 2 status: 0 +33ms
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +18ms
zigbee-herdsman:deconz:frameParser DEVICE_STATE changed: 10100110 +127ms
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 1 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +127ms
zigbee-herdsman:deconz:driver query aps data confirm +7ms
zigbee-herdsman:deconz:driver DATA_CONFIRM - sending data state request - SeqNr. 6 +16ms
zigbee-herdsman:deconz:frameParser DATA_CONFIRM RESPONSE - destAddr: 0xfffc request id: 2 confirm status: 0 +40ms
zigbee-herdsman:deconz:driver networkstate: 2 apsDataConfirm: 0 apsDataIndication: 0 configChanged: 0 apsRequestFreeSlots: 1 +17ms
zigbee-herdsman:deconz:adapter PERMIT_JOIN - 0 seconds +264ms
////// Sleeping for 2 seconds...
////// CALL await controller.stop()
zigbee-herdsman:deconz:driver Port closed +5ms
zigbee-herdsman:deconz:driver send write parameter request from queue. seqNr: 7 paramId: 33 param: 0 +0ms
zigbee-herdsman:deconz:driver Error writing socket Port: Cannot call write after a stream was destroyed +1ms
zigbee-herdsman:deconz:adapter checking coordinator simple descriptor +350ms
zigbee-herdsman:deconz:driver DATA_REQUEST - destAddr: 0x0 EP:0 SeqNr. 8 request id: 3 +361ms
zigbee-herdsman:deconz:driver Error writing socket Port: Cannot call write after a stream was destroyed +1ms
zigbee-herdsman:deconz:driver Timeout for request - CMD: 0xb seqNr: 7 +10s
[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "TIMEOUT".] {
code: 'ERR_UNHANDLED_REJECTION'
}
ELIFECYCLE Command failed with exit code 1.
Hey, I encountered a bug today where my Node process exited due to an unhandled promise rejection and finally found out that stopping the coordinator was the cause.
Following code can reproduce the issue:
I added dates for debugging purposes but when the app crashes no error handler is invoked. I used my stopwatch and it takes 10 seconds to crash. Output:
I tried to get the stack using the unhandled rejection event but it was undefined, I do not know where to search. It would be nice to restart the controller without terminating Node itself. I am using CC253X with DECONZ Adapter.
Maybe this information is helpful to debug the problem. If you have any further questions, let me know. I am using v0.14.40 and the test.db file had no previous devices in it.