OpenZWave / open-zwave

a C++ library to control Z-Wave Networks via a USB Z-Wave Controller.
http://www.openzwave.net/
GNU Lesser General Public License v3.0
1.05k stars 918 forks source link

"Received time out notification from..." #2033

Open RicardP opened 4 years ago

RicardP commented 4 years ago

Hello there! :)

Windows 10 running Domoticz latest beta with OZW 1.6.954.gaa29b66e.dirty.

I am getting really annoyed over those time out notifications flooding my log, seems only related to Nodes that have to jump over another fellow Node to reach the Controller and vise-verse. Why all the timeouts, and Send Data errors I experience in my network? Only about 4 Nodes out of 30 jumps only one fellow Node to reach the Controller!

And yes I have already increased the controller Retry timeout from default 10 to 15 seconds, what is wrong or is really Z-Wave be so bad in meshing?

OZW log, even some Error ZW_SEND_DATA....

2019-12-09 17:58:03.316 Error, Node056, ERROR: ZW_SEND_DATA could not be delivered to Z-Wave stack Dumping queued log messages 2019-12-09 17:58:03.316 11160 ERROR: ZW_SEND_DATA could not be delivered to Z-Wave stack End of queued log message dump 2019-12-09 17:58:18.253 Error, Node056, ERROR: Dropping command, expected response not received after 1 attempt(s)

Snap from Domotics Log:

2019-12-09 18:00:54.186 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 100 (0x64) 2019-12-09 18:01:17.591 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 100 (0x64) 2019-12-09 18:02:05.355 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 101 (0x65) 2019-12-09 18:04:33.788 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 56 (0x38) 2019-12-09 18:07:24.876 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 100 (0x64) 2019-12-09 18:12:51.523 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 100 (0x64) 2019-12-09 18:19:08.980 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 101 (0x65) 2019-12-09 18:21:36.350 Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 56 (0x38)

Also related to this issue I think... https://github.com/OpenZWave/open-zwave/issues/2018#issuecomment-559720910

Olifant1990 commented 4 years ago

I'm joining in... Curious what is the cause of this

Fishwaldo commented 4 years ago

There is 0 actionable information in your post... what are you wanting?

RicardP commented 4 years ago

@Fishwaldo Sorry I thought it was enough for at least a reflection. Or is there a OZW documentation on those logs messages so uses know if it is a problem and if so what to do? At least I have problems with those timeouts, but I don't know what performance to expect from ZWave if a Node must jump. Because as said I always see issues with jumping packets, time outs and bad responce... In this case Time out Nodes 100 and 101 jumps only 7meters open space to the "in between Mesh Node" that itself have flawless connection to the Controller. Thanks!

petergebruers commented 4 years ago

It is not easy to give a quick answer, but I would like to start somewhere, maybe we will get there, maybe not. I say this because I mostly active on the Fibaro forum (since 2013) and I have tried to help maybe 20 users with similar complaints. I mean, mostly they report "why is Z-Wave sometimes (very) slow". I have helped a few but it is never easy. Give me some time, I'll start by talking about that "timeout" you see in Domoticz

RicardP commented 4 years ago

@petergebruers Thank you. I have had those timeouts messages (more or less) since I started with Domoticz/OZW 4 years ago. But because there been more annoying issues I did not mind so much before ...meaning now my Domoticz system is doing quite well :)

In the beginning I had bad reception to my Aeon Gen5 Stick with many jumping Nodes even centered placed in my villa, very short distances with one or two walls of wooden constructions between. OZW add-on performed very bad in Domoticz, slow response etc. So I modified my Controller by soldering in an SMA-connector and an external Dipol antenna. That made magics! Since the external antenna I have 90% direct connections and no issues except for the few that jumps. So I cannot understand what benefit ZWave Mesh protocol provides except it looks nice visualized... :) To understand if I had some RF interference etc. I also monitored the amount of ZWave packets with my SDR and noted ZWave uses <4% of the available time, meaning 96% of the time is free for communication! Something must be wrong!?

petergebruers commented 4 years ago

@RicardP thank you for some more background.

I would like to talk about "timouts" but I think you gave me a few hints to skip the basic for now and jump straight to heavy stuff...

So I cannot understand what benefit ZWave Mesh protocol provides except it looks nice visualized... :)

First thing is to forget everything you know about routing, because Z-Wave probably does not work the way you think. I do not want to underestimate you at all, but I have not met anyone yet, who has ever worked with "source routed" networks (unless you are as old as me and know something about IBM's Token Ring Networks)

Second thing is to forget about all those nice charts, they lead you astray, lead to false assumption. It is better to buy a Zniffer and then you'll understand why I hate those visualizers.

I have been putting of writing about the "routing protocols" (yes, that is "plural") in Z-Wave because I've started talking abut this on the Z-Wave Forum but very few people were interested. Unless of course their whole network starts failing and the usual approach (reduce polling, reporting, heal mesh, reorient stuff)

I think... If you start soldering antennae than you're ready to delve into the protocol :wink:

To understand if I had some RF interference etc. I also monitored the amount of ZWave packets with my SDR and noted ZWave uses <4% of the available time, meaning 96% of the time is free for communication! Something must be wrong!?

The SDR is an excellent tool to check sources of interference. I have one, I recommend it, because it can see "things" that other tools fail to see.

However... It cannot decode what Z-Wave is doing.

It is all a bit long to explain but recently I've worked with a few guys and user "amilanov" has done an excellent job of summarizing what I wanted to say. Please register on the Fibaro forum and have a look at this post and see what kind of trouble Zniffer can find, what it can do:

https://forum.fibaro.com/topic/48365-hc2-repair-maintenance-guide

He also points to a topic on how to download the tools (please read till the end for updates) and flash your device.

Here are some buying tips...

https://forum.fibaro.com/topic/48479-fibaro-wallplug-frozen-not-able-to-delete-the-plug/?do=findComment&comment=197372

I motivated amilanov to buy that tool and I have worked with him to hunt for issues and this is what he said when someone asked "should I buy a Zniffer"

You will not regret buying a Zniffer. I don't touch anything in my HC2 anymore without the Zniffer running. It feels like flying blind without seeing what is really happening in the background. I would suggest that everyone buys one, even if you have a small system. I had a problem with a device that was playing up badly. I excluded it from the HC2 and added it back but it was still playing badly. I excluded it again but this time I checked Zniffer and saw that it was still communicating with the HC2 after exclusion. Only the Zniffer could show me that.

I have to say that this problem would be more obvious in OpenZWave because it has this OZW_Log.txt and can do very detailed logging. A Fibaro HC2 has logging to but it is not accessible to the end user.

So we're back were I started, maybe let's have a look at your OZW_Log.txt first... Still would recommend you to read those topics...

About the logging in Domoticz... When you see this: Status: OpenZWave: Received timeout notification from HomeID: 1234567890, NodeID: 56 (0x38)

You will find more detail in OZW_Log.txt in debug mode. The reason you don't get more detail is that the API does give more details to Domoticz. I've thought about adding something but it is not trivial and arguably we would probably ask you to have a look at OZW_Log.txt anyway.

That line in Domoticz corresponds with "Notification: Notification - TimeOut" message in OZW log, and you have to search back to see what has timed out.

Here is a typical example...

2019-09-08 17:40:29.007 Info, Node012, Sending (NoOp) message (Callback ID=0x0e, Expected Reply=0x13) - NoOperation_Set (Node=12): 0x01, 0x09, 0x00, 0x13, 0x0c, 0x02, 0x00, 0x00, 0x25, 0x0e, 0xc0
2019-09-08 17:40:29.007 Info, Node012, Encrypted Flag is 0
2019-09-08 17:40:29.013 Detail, Node012,   Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2019-09-08 17:40:29.013 Detail, Node012,   ZW_SEND_DATA delivered to Z-Wave stack
2019-09-08 17:40:39.007 Error, Node012, ERROR: Dropping command, expected response not received after 1 attempt(s)
2019-09-08 17:40:39.008 Detail, Node012, Removing current message
2019-09-08 17:40:39.008 Detail, Node012, Notification: Notification - TimeOut

Things worth noting are:

In this example, the most likely cause is that node 12 is for example a wall plug, and it is not plugged in. But there can be other reasons eg a device does not implement some command. So not all timeouts have exactly the same log lines as that NOP example.

RicardP commented 4 years ago

@petergebruers Thanks a lot for sharing your knowledge and experiences.

Well I never studied routing data packets like IBM Token Ring even that my age also included that prehistoric era... :) Anyway I am a little confused about the meaning of Z-Wave Meshing if it is not forwarding data packets in an effective way, but to learn more I will register on Fibaro forum and study the Zniffer link. Seems interesting!

I am much an RF and Hardware guy so for me taking things apart and look for possible modifications is awesome :) Mounting an antenna connector to the Aeon stick was really necessary, it performed so bad! I am surprised not a single Z-Wave Controller I've seen have an external antenna connector... It gives the impression every Manufacture believes that Z-Wave routes traffic perfectly as long there is at least one fellow Node around, like I thought in the beginning... Why not optimize the Controller RF characteristics design even having a Mesh-Routing protocol, a good RF signal is very important... nothing is better than the weakest link!

A Zniffer set seems interesting to buy, but at the same time why should one need that analyzing stuff to get its Z-Wave system up and running! Sigma seems not have a lot of ambitions for Z-Wave if one buggy/defective Node can alter the whole system. There should be more Z-Wave native support to provide users feedback about both Network-RF and abnormal Nodes, not only "Dead". Maybe there is but not found out/implemented in OZW? It seems not Z-Wave is what it is marketed to be if anyone trying to fix a common issue like a bad/buggy Node (frequent problem...), needs to be an advanced technicians with a Zniffer tool :)

Thanks for all suggestions, I will start to turn on log debug mode and provide it asap, maybe it gives the answer what is going on here!

Cheers!

petergebruers commented 4 years ago

Anyway I am a little confused about the meaning of Z-Wave Meshing if it is not forwarding data packets in an effective way

It is very difficult to find out what is going on without Zniffer, because routing certainly works but probably not the way you expect it, and much of what is going is hidden by "the protocol" (the SDK, the library a device uses). The Aeotec stick unfortunately does not use a more recent Z-Wave SDK and because of that it does not support "extended status".

If it did, OZW would print some routing information and give you RSSI of the last hop. Still not very detailed but often interesting. I talk about this here:

https://forum.fibaro.com/topic/42703-geek-stuff-z-wave-routing-packets-and-serial-api

Mounting an antenna connector to the Aeon stick was really necessary, it performed so bad!

This is anecdotal, when I bought mine, I noticed it did not have the same range as the older Fibaro 2 HomeCenter. Which is odd because it has a more recent chip with better performance (series 500 versus 300). Anyway, I never investigated this, I took out the battery and used it for testing purposes.

I've noticed (see that post) the performance can change by 10 dB by adding a short USB extension lead, but better test that.

Since we don't know where users put there devices, antennae tend to be omnidirectional. The HC2 has an external antenna... Its enclosure is solid Aluminium.

End users tend to forget what they gain by adding an antenna can be lost by adding a COAX cable, attenuation of "cheap" cable at that frequency is high. But as an RF tinkerer you know that, I'm telling this because inexperienced users ignore this...

Performance also depends on the "device", this may be evident but I've compared the performance of an older, poorer performer with a more recent device.

See https://forum.fibaro.com/topic/42703-geek-stuff-z-wave-routing-packets-and-serial-api/?do=findComment&comment=179282 , click on "spoiler". See maps of an older Fibaro Wall Plug versus a new "Roller Shutter 3".

A Zniffer set seems interesting to buy, but at the same time why should one need that analyzing stuff to get its Z-Wave system up and running! Sigma seems not have a lot of ambitions for Z-Wave if one buggy/defective Node can alter the whole system.

Silabs (owner of Sigma) does have tools and Z-Wave is an evolving standard. The Zniffer firmware and tool are also free, gratis...

On the Fibaro forum I sometimes my "Z-Wave Rules of thumb"

My Z-Wave rules of thumb: less than 5 modules, sometimes issues because not enough devices to cover all areas. Between 5 and 20...almost never issues. Between 20 and 50... sometimes trouble. Between 50 and 100 devices real "weirdness" starts and communication gets "fragile". Above 100 make sure you have the tools to diagnose network issues (ie Zniffer, you can make one yourself) or get in touch with a good certified installer to support you (I am and end user, not an installer). Know about CPU and memory usage. Follow this forum (Fibaro).

RicardP commented 4 years ago

@petergebruers Happy holidays Peter! I did not find time to dig further in this but I noticed extended log information about the Time Out Notifications since Domoticz includes OZW version 1.6.989. So I activated Debug Log, maybe you see something funny without spending a lot of time to it...? And what is the Warning about CommandClass 50 :)

OZW_Log.zip

gizmocuz commented 4 years ago

Since updating to the latest OZW code I to get this error in the Log. Nothing is received anymore. Build from around 16 December was working correctly

Fishwaldo commented 4 years ago

@gizmocuz you know the drill... Logs please :)

RicardP commented 4 years ago

@Fishwaldo , please check my log two posts up. Or maybe you did already?