Mr-Markus / ZigbeeNet

A .NET Standard library for working with ZigBee
Eclipse Public License 1.0
130 stars 46 forks source link

Serial connection dies every 59 minutes #148

Closed reinux closed 3 years ago

reinux commented 3 years ago

I see an exception in my logs, and it stops responding or receiving commands. Using ConBee II.

2021-03-14 11:04:00.619 -07:00 [ERR] Error while reading from serial port: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
   at System.IO.Ports.SerialStream.EndRead(IAsyncResult asyncResult)
   at System.IO.Ports.SerialStream.Read(Byte[] array, Int32 offset, Int32 count, Int32 timeout)
   at System.IO.Ports.SerialPort.Read(Byte[] buffer, Int32 offset, Int32 count)
   at ZigBeeNet.Tranport.SerialPort.ZigBeeSerialPort.ReaderTask()

It happens every 59 minutes (58 minutes and 53 seconds, to be exact) without fail, so I'm pretty certain it isn't a hardware issue. Especially since it works fine with deConz.

It happens even when all I'm doing is Initialize() and Startup(false).

Looking at ZigBeeSerialPort.ReaderTask(), it looks like it makes an attempt to continue on, but it still dies.

Mr-Markus commented 3 years ago

I do not think that it comes from ZigBeeNet library. Maybe it is your host OS? It also could be possible that deConz automatically reconnects without any hint, but I am not sure about it

reinux commented 3 years ago

Welp, I tried updating the firmware on the ConBee II, and now Initialize() doesn't return 🥴

It's still correctly interpreting commands and dropping them (because it's still in init), so I'm pretty sure it's nothing to do with the baud rate or flow control. Tried clearing the data store too.

deConz still works though.

2021-03-17 07:58:55.959 -07:00 [DBG] Dropping APS: state="INITIALISING", frame=ZigBeeApsFrame [sourceAddress=35655/1, destinationAddress=0/1, profile=260, cluster=8, addressMode=0, radius=0, apsCounter=0, payload=24 155 10 0 0 32 0]

Maybe I should just try another dongle...

jgmdavies commented 3 years ago

@reinux @Mr-Markus

I also get something like this with my ConBee II (not updated) and ZigbeeNet. I'll need to revisit to see if the symptoms match, and if it's the same time interval.

Any other ConBee users out there?

Jim

DavidKarlas commented 3 years ago

Looking at http://deconz.dresden-elektronik.de/raspbian/deCONZ-Serial-Protocol-en.pdf and searching for Watchdog I found this:

Watchdog timeout in seconds. Must be 
reset by the application periodically
(since protocol version 0x0108)
By writing a lower value like 2 seconds, 
the firmware can be rebooted.

My assumption is... its set to 1 hour by default, and we never update it, afaiu, we should write parameter periodically, change should be as simple as calling https://github.com/Mr-Markus/ZigbeeNet/blob/f6e879080bbcb173d3b051164861d21ae8fcd9c7/libraries/ZigbeeNet.Hardware.ConBee/Internal/ConBeeInterface.cs#L380 periodically, by library...

jgmdavies commented 3 years ago

I did the experiment here this morning, and the serial reads failed at about 59 min 30 sec. Jim

jgmdavies commented 3 years ago

I found relevant comments in another GitHub project, about half-way down:

dhylands commented on 25 May 2019

I'm starting to see the dongle become non-responsive after sitting for a few hours. My original ConBee is rock solid and I use it every day for controlling lights and outlets and don't have any issues.

Once my Conbee II becomes non responsive I have to unplug and replug it and disable/enable the zigbee adapter to get things working again.

manup commented on 25 May 2019

Hmm strange this shouldn't happen. Is the zigbee-adapter setting the Watchdog TTL parameter periodically?
Otherwise the firmware will reboot roughly once per hour. For ConBee II this means the USB enumeration will be done again and application needs to reconnect. For ConBee I and RaspBee this isn't noticeable, since there is a FTDI in between and the USB device therefore doesn't re-enumerate when the firmware reboots.

I've updated the deCONZ Serial Protocol PDF to version 1.14. The Watchdog TTL parameter is now documented, this was previously only mentioned in https://github.com/dresden-elektronik/deconz-rest-plugin/issues/158
DavidKarlas commented 3 years ago

Can someone try and see if this would fix it? https://github.com/Mr-Markus/ZigbeeNet/pull/149

reinux commented 3 years ago

change should be as simple as calling

https://github.com/Mr-Markus/ZigbeeNet/blob/f6e879080bbcb173d3b051164861d21ae8fcd9c7/libraries/ZigbeeNet.Hardware.ConBee/Internal/ConBeeInterface.cs#L380

periodically, by library...

I'd like to try this, but now since the firmware update, it isn't getting past Initialize()... Is there a quick fix, or should I just roll it back?

DavidKarlas commented 3 years ago

I will try to upgrade dongle this weekend and see what problem is during initialization...

reinux commented 3 years ago

Thanks! In the meantime, I'll downgrade my firmware and test this out.

reinux commented 3 years ago

Derp... I was testing on my own by setting the proprerty in REPL every 30 mins, which actually worked.

I'll try now #149.

reinux commented 3 years ago

So far so good. It's survived an hour. I'll eave it on for a few more.

reinux commented 3 years ago

I think it's good. Thanks for the fix!

nicolaiw commented 3 years ago

Thank you all for your effort!