PLCHome / node-red-contrib-velux

Provides nodes to talk to Velux® KLF-200 io-homecontrol® Gateway
9 stars 2 forks source link

No communication to node / Error timeout #8

Closed malansk closed 3 years ago

malansk commented 4 years ago

Hello,

I run Node Red V7.0 on Homematic CCU3. Sometimes (usually a few hours after restart of system) I get in the system log following error message (even though the physical actions are still beeing executed)

==================

Oct 19 18:19:16 ccu3-webui daemon.err node-red: (node:1195) UnhandledPromiseRejectionWarning: Error: timeout GW_COMMAND_SEND_CFM Oct 19 18:19:16 ccu3-webui daemon.err node-red: at Object. (/usr/local/addons/redmatic/var/node_modules/node-red-contrib-velux/node_modules/velux-klf200-api/lib/net.js:169:19) Oct 19 18:19:16 ccu3-webui daemon.err node-red: at ontimeout (timers.js:436:11) Oct 19 18:19:16 ccu3-webui daemon.err node-red: at tryOnTimeout (timers.js:300:5) Oct 19 18:19:16 ccu3-webui daemon.err node-red: at listOnTimeout (timers.js:263:5) Oct 19 18:19:16 ccu3-webui daemon.err node-red: at Timer.processTimers (timers.js:223:10) Oct 19 18:19:16 ccu3-webui daemon.err node-red: (node:1195) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)

=====================

Also, sometimes in the Node-Red Web-GUI, it is shown "No communication to node" on one of the nodes. The affected node changes from time to time. The status of the node (position of window shutter) is not updated in this state. However, after a new positioning command to that node, the status is updated (but "No communication to node" is still shown).

After restart of system (Node-Red Process on CCU3), no more error but will return after a few hours

======================= Also, sometimes the UTC timestamp of the KLF200 node messages is from (year 1942 or year 1972)

After restart of system, UTC time is correct but will become bad after a few hours

======

What would be the best approach to debug this errors ?

Best regards, Martin

PLCHome commented 4 years ago

Yes, I know that it doesn't work with the deploy. It runs into the no-connection bug of the KLF because the TLS connection is not cleared down correctly, Contributors are welcome to fix this bug.

PLCHome commented 4 years ago

Current problems GW_NODE_STATE_POSITION_CHANGED_NTF contains an incorrect timestamp. The lowest 2 bytes of the 4 bytes are sent to the higher 2 bytes and the lowest 2 bytes are 0. In response to the "GW_GET_ALL_NODES_INFORMATION_REQ" command, the correct time stamp is sent: 5be8d806 - 2018-11-12T01: 31: 50.000Z 15 sec Later a "GW_NODE_STATE_POSITION_CHANGED_NTF" hits. This contains a timestamp: d8160000 correct was here 5be8d816

If there is no communication with the KLF every 10 minutes to 15 minutes, the connection will be disconnected as described in the manual. If this happens when the home monitor "GW_HOUSE_STATUS_MONITOR_ENABLE_REQ" is activated, the KLF200 is no longer reachable. The KLF200 no longer sends the TLS command "Change Cipher Spec." on TLS start. This means that TLS encryption can no longer be initiated. I saw these in the wireshark. Should anyone else notice this error, it would be nice if that this someone also reports to the VELUX hotline.

The KLF200 needs more than 3 seconds to connect. It needs more then a second to send the key and more then a second for the TLS command "Change Cipher Spec.".

malansk commented 4 years ago

Yes, I know that it doesn't work with the deploy. It runs into the no-connection bug of the KLF because the TLS connection is not cleared down correctly, Contributors are welcome to fix this bug.

I'm not shure if my system really runs into a TLS no-connection state since it will still accept command to that node (even if "No connection" is shown in the WEB-GUI for that node). Also, the other nodes (other Window shutters) still show "Connected". After a few minutes, HomeMonitor status is updated and the affected node will correctly update its status (e.g. change from 0% to 100% position), however it is still shown as "No connection" for that node.

Are the TLS connections unique for each node (every node [e.g. every Window shutter] has its own conncection) or is there only one single TLS connection to the KLF Gateway (used for every node) ?

PLCHome commented 4 years ago

No the TLS connection exists only between node red and the LAN interface of the KLF. It affects all nodes in the nodered

malansk commented 4 years ago

So let's assume that e.g. a Window Shutter is starting to move from 0% position to its new commanded position 100% but the status transmission from Window Shutter to KLF Gateway (that it had reached it's new position @ 100%) get's lost e.g. due to some RF-Interference. -> the Window Shutter node in Node-Red Web-GUI changes to "No connection to node", the Position of that node remains at the old position of 0%. Now, after a few minutes home monitor reports the correct new position of "100%". Current behaviour is that then the new position of "100%" is shown in the node, but the "no connection to node" status remains.

What would be needed to reset the "No connection to node" status to "Connected" after successfull reception of the new position from the home monitor ? Is this a topic for KLF firmware, node-red or node-red-contrib-velux ?

PLCHome commented 4 years ago

OK, I don't know this behavior. I have the problem that when the connection with the KLF breaks. Then nothing works anymore in all Velux nodes in Node-Red.

Heare a GW_COMMAND_SEND_CFM is expected from any command sent, but it does not arrive. DEBUG=velux-klf200-api:* would be helpful

malansk commented 4 years ago

From what I have googled is that "DEBUG" is an environmental variable that needs to be declared, but how can I do that in the Node-Red WEB-GUI ? Or do I need to SSH into Homamatic CCU3 and edit some files with vi ?

malansk commented 4 years ago

OK, I don't know this behavior. I have the problem that when the connection with the KLF breaks. Then nothing works anymore in all Velux nodes in Node-Red.

Heare a GW_COMMAND_SEND_CFM is expected from any command sent, but it does not arrive. DEBUG=velux-klf200-api:* would be helpful

*Yesterday at 21:10 was the last regular (every few minutes) home monitor update (GW_GET_ALL_NODES_INFORMATION_NTF and GW_NODE_STATE_POSITION_CHANGED_NTF) according to Node-Red GUI

*Yesterday at 21:30 the regular Homematic CCU3 program (to open Window at 21:30 and close at 22:00) was called, it did work (I can see in the Log the Status Change of the window position from close to open at 21:30 and back to close at 22:00) but it threw an error in the Homematic CCU3 Syslog:

21:30:05 ccu3-webui daemon.err node-red: (node:9031) UnhandledPromiseRejectionWarning: Error: timeout GW_COMMAND_SEND_CFM Nov 2 21:30:05 ccu3-webui daemon.err node-red: at Object. (/usr/local/addons/redmatic/var/node_modules/node-red-contrib-velux/node_modules/velux-klf200-api/lib/net.js:169:19) Nov 2 21:30:05 ccu3-webui daemon.err node-red: at listOnTimeout (internal/timers.js:554:17) Nov 2 21:30:05 ccu3-webui daemon.err node-red: at processTimers (internal/timers.js:497:7) Nov 2 21:30:05 ccu3-webui daemon.err node-red: (node:9031) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1) Nov 2 21:30:05 ccu3-webui daemon.err node-red: (node:9031) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

*Today at 06:13 home monitor started to send regular updates again (without any restart of anything), but not the "GW_GET_ALL_NODES_INFORMATION_NTF", only the "GW_NODE_STATE_POSITION_CHANGED_NTF" messages.

*Today at 09:00 the regular Homematic CCU3 programs (to open Window shutters and Window openers) were called and ran successfully.

*Today at 10:37 after manual Restart of the Node-Red Process @Homematic CCU3, the KLF send again the GW_GET_ALL_NODES_INFORMATION_NTF for every node, but only once. Only the GW_NODE_STATE_POSITION_CHANGED_NTF messages are sent every 2..15 minutes.

////////////// So to summarize, I would conclude the following: 1.) GW_NODE_STATE_POSITION_CHANGED_NTF has the bug of the swapped byte order / 00 -> UTC time stamp incorrect 2.) after a few hours, KLF interface stops to send GW_GET_ALL_NODES_INFORMATION_NTF and GW_NODE_STATE_POSITION_CHANGED_NTF -> KLF will stil accept and execute commands, however the logs show "timeout GW_COMMAND_SEND_CFM" 3.) after a few hours, KLF interface starts to send GW_NODE_STATE_POSITION_CHANGED_NTF messages again. But still no GW_GET_ALL_NODES_INFORMATION_NTF messages 4.) the keep-alive queries are not every 10 minutes (like in the description stated) but every 2..15 minutes on my system. Is that by purpse (random number between 2..12minutes) ? Is that time defined by KFL gateway or by node-red-contrib-velux ? Probably by KLF since you tried to implement a fixed-interval polling method. 5.) In the Node-Red Web-GUI, the Node Status "no connection to node" is sometimes set, even though the position values are still beeing reported back on a regular basis for that node and even thought that node will accept an run an new command. The "No connection to node" status in only claered after a new command is send to the node or after restart of Node-Red Process.

6.) In spite of these errors in the CCU3 Syslog and in the Node-Red WEB-GUI, the system is propperly working the nodes, even for several weeks in a row without problem. I do however restart the CCU3 at least once per week due to other programming activities so I don't know if these errors would lead to problems e.g. after several months runtime.

PLCHome commented 3 years ago

That may be anything. I can't do anything with this an analysis. I don't know the flow and I don't know the logs.

Somehow I have the feeling that some command to query is polled in a short cycle. That klf doesn't like it when you blast it.

malansk commented 3 years ago

I have made some more tests regarding this timeout issues and have found out that they are real transmission problems. The VELUX KLF200 was able to transmit commands successfully, but sometimes the status of the window shutter devices was not always correctly received due to changing weather conditions (e.g. cold & high humidity). I have moved the KLF200 now much closer and I don't have any communications errors anymore.

In addition, I had some time-overlap of transmissions (simultaneous commands e.g. at Sunset/Sunrise to several devices) so they were blocking each other's status responses. Now I use minimum of 1 minute delay between commands to make shure the devices are no longer moving before executing a new command to an other device. Looks like the window shutters transmit constantly (e.g. Position) as soon as they are mooving.

The behaviour mentioned above as point 5) remains however. If at one time the connection is lost, the status "no connection to node" remains in that status - even if the home status monitor will start to send messages again for this device because it has reconnected in the meanwhile. Sending new command or restart of Node-Red will clear the error status.

So to summarize, I think you can close this topic. Thank you for making the software, I was able to integrate my window shutters from 2011 into my Homematic smarthome system :-)