OpenZWave / open-zwave

a C++ library to control Z-Wave Networks via a USB Z-Wave Controller.
http://www.openzwave.net/
GNU Lesser General Public License v3.0
1.05k stars 916 forks source link

Fibaro FGS221, FGS211, FGD211 Hardware Switch get completely stuck about once per day #390

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

preamble: 

I've been owner of a Fibaro Home Center 2 for about 2 years, and because I love 
OpenSource software and I'm completely disappointed
about Fibaro bad software design and nonexistent support, I switched from HC2 
to Domoticz+OpenZWave.
I get almost all working, but two major problem remain (only with Fibaro 
hardware of course :)), but first let's me describe my ZWave network:

  1 RaZberry Primary Controller running Domoticz+OpenZWave 1.2.919
  3 FIBARO System FGD211 Universal Dimmer 500W v1.4
  5 FIBARO System FGS221 Double Relay Switch 2x1.5kW v1.4
  1 FIBARO System FGS211 Switch 3kW v1.4
  2 FIBARO System FGMS001 Motion Sensor v 2.4
  2 FIBARO System FGMS001 Motion Sensor v 2.6
  1 FIBARO System FGSS101 Smoke Sensor
  1 Philio Technology Corporation Slim Multi-Sensor PSM02
  1 Everspring AN158 Plug-in Meter Appliance Module
  1 Horstmann HRT4-ZW Thermostat Transmitter
  1 Horstmann ASR-ZW Thermostat Receiver
  9 Everspring SM103 Door/Window Sensor
  1 Everspring SP814 Motion Detector
  2 FIBARO System FGWPE Wall Plug
  1 Aeon Labs Minimote Secondary Controller
  1 Duwi Secondary Controller 

2 problems:

1) about once per day, in completly random order, one or more of the Fibaro 
switches (the problem happen with all FGD211, FGS221 and FGS211)
  get completly stuck, for stuck I mean that I must power off the general elettrical power line of the house to make it working again
  (the switch stop responding to zwave commands and from hardware wall switch too)

2) Fibaro Motion Sensor FGMS-001 V2.4 is not reliable.
  I tested two FGMS-001 V2.4 and two FGMS-001 V2.6.
  the version 2.4 get included without problems, but motion sensor doesn't send any triggered action to OZW controller
  the version 2.6 works perfectly out of the box and it's reliable (purchased last month) (let's me know if I must open another bug report for this)

considerations:

My Fibaro switches and motion sensors are 2 years old (I think the first series 
sold by Fibaro), and I do not have new switch release to test. I suppose that 
these problems depend from not well supported functions or some strange bugs of 
the earlier Fibaro switch release. However I never had these problems when used 
HC2. So I think they depend on some interaction with OZW.

How can I actively help OZW team to debug these two problems?

Many thanks in advance.

PS. Attached the log file, follow the NodeID number of problematic fibaro 
hardware:

Node:       Type:
056 (0x38)  FIBARO System FGS211 Switch 3kW
014 (0x0e)  FIBARO System FGMS001 Motion Sensor
012 (0x0c)  FIBARO System FGS221 Double Relay Switch 2x1.5kW
011 (0x0b)  FIBARO System FGD211 Universal Dimmer 500W
009 (0x09)  FIBARO System FGS221 Double Relay Switch 2x1.5kW
008 (0x08)  FIBARO System FGS221 Double Relay Switch 2x1.5kW
007 (0x07)  FIBARO System FGD211 Universal Dimmer 500W
006 (0x06)  FIBARO System FGS221 Double Relay Switch 2x1.5kW
005 (0x05)  FIBARO System FGD211 Universal Dimmer 500W
004 (0x04)  FIBARO System FGS221 Double Relay Switch 2x1.5kW

in the following log, the:
Node008 get stuck on 2014-11-08 05:39:43
Node007 get stuck on 2014-11-08 10:48:06
Node004 get stuck on 2014-11-08 10:39:22

Original issue reported on code.google.com by ugo.v...@gmail.com on 8 Nov 2014 at 12:32

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,
First thing I'd consider doing is stopping the polling. I see almost every 
device you mention that locks up has polling enabled for some values. 

(2014-11-07 07:25:04.546 Info, Node004, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 1 items
2014-11-07 07:25:04.547 Info, Node004, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x02)--poll list has 2 items
2014-11-07 07:25:04.547 Info, Node005, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x26,in=0x00,id=0x01)--poll list has 3 items
2014-11-07 07:25:04.547 Info, Node006, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 4 items
2014-11-07 07:25:04.547 Info, Node006, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x02)--poll list has 5 items
2014-11-07 07:25:04.547 Info, Node007, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x26,in=0x00,id=0x01)--poll list has 6 items
2014-11-07 07:25:04.548 Info, Node008, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 7 items
2014-11-07 07:25:04.548 Info, Node008, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x02)--poll list has 8 items
2014-11-07 07:25:04.548 Info, Node009, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 9 items
2014-11-07 07:25:04.548 Info, Node009, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x02)--poll list has 10 items
2014-11-07 07:25:04.548 Info, Node011, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x26,in=0x00,id=0x01)--poll list has 11 items
2014-11-07 07:25:04.548 Info, Node012, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 12 items
2014-11-07 07:25:04.549 Info, Node012, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x02)--poll list has 13 items
2014-11-07 07:25:04.549 Info, Node040, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 14 items
2014-11-07 07:25:04.549 Info, Node056, EnablePoll for HomeID 0xdc67dad5, 
value(cc=0x25,in=0x00,id=0x01)--poll list has 15 items
)

If you can get the Associations setup correctly there should be no need to 
poll. 

I'd also say thats the issue you are facing with the Motion Sensor. I have a 
2.06 version of the device (much lower than yours) and it works fine. Usually 
if your not getting reports - Its because the Associations are not setup 
correctly.

Original comment by jus...@dynam.ac on 9 Nov 2014 at 7:06

GoogleCodeExporter commented 9 years ago
Hi,

thank you very much for the reply.

The attached log is the last try I done to make my zwave network reliable, and 
as last test I enable polling for the stuck nodes (without success of course 
:)), before I've been polling disabled for every nodes of my network, but 
Fibaro Hardware get stuck anyway.

I was hoping that my problems were known from others OZW users. I found others 
users into fibaro forum with analog problems (without getting support from 
Fibaro).

Anyway, I must understand if this interaction is related to RaZberry ZWave 
Chip, or happen with AeonLabs ZStick S2 also.

Because the same behavior happen if I use the ZWay software (Only using Fibaro 
HC2 the devices doesn't get stuck over the time, damn them! :)).

So, I will try to include as secondary controller an AeonLab ZStick S2 and make 
a controller shift from RaZberry chip to ZStick, and will try if so the Fibaro 
Relays continue to get stuck.

Last question: I tried for a long time to find a command line OZW utility to 
send commands and test OpenZWave reliability, but I can't find any updated tool 
(python-OpenZWave doesn't compile anymore for exapmple), exist any current 
usable command line utility?

I will report here future updates.

Kind Regards

PS. about the motion sensor problem, if it is the case, I'll open a new issue. 
However the association group are corrects (group1: 1, group2: 1, group3: 1)... 
I think OZW should manage the 2.4 version in a little different way of 2.6, 
because they are not identical.

Original comment by ugo.v...@gmail.com on 9 Nov 2014 at 8:20

GoogleCodeExporter commented 9 years ago
Hi,

regarding the FGD211 "stuck" issue, I had similar problems to what you 
described. Check out the following thread: 
https://groups.google.com/d/msg/openzwave/obqXdPNGAR4/YZQSqJOx1DcJ, it might 
help!

Original comment by jo...@stromnet.se on 10 Nov 2014 at 8:56

GoogleCodeExporter commented 9 years ago
Hi, thank you.

I read this forum post before posting my bug report. But the stuck problem for 
me happen not only for FGD211 but for FGS221 and FGS211 also.

to be honest, the stuck problem "seems" occur more frequently on FGD211 switch 
than others.

but before posting this bug report, I tried:

1) put primary controller Node001 into group1, group2 and group3 (anyway was 
already the default)

2) changed option 10 to 0 (this option is only available in FGD211, but not in 
others modules)

anyway my modules continue to get stuck once per day (in random order).

Tomorrow I will try stop all zwave controllers of my home, to figure out if the 
fibaro switches get stuck anyway after some hours. And if this is the case, 
goodbye fibaro hardware for ever, too crap software and hardware :)

Kind Regards

Original comment by ugo.v...@gmail.com on 10 Nov 2014 at 7:24

GoogleCodeExporter commented 9 years ago
Hello, I don't want promote fibaro, but half of my current installation are 
fibaro element and each of them work as a charme. 

Do you remove your config file after restarting your network?

Original comment by jeanfran...@gmail.com on 11 Nov 2014 at 7:33

GoogleCodeExporter commented 9 years ago
Hi,

I already tried to cleaning up zwave network config and start from scratch 
without success.

Can you confirm which fibaro model switches do you use and which version?

Kind Regards

Original comment by ugo.v...@gmail.com on 11 Nov 2014 at 2:31

GoogleCodeExporter commented 9 years ago
I continuing to testing my environment, so in the last days i done:

1) turned off my RaZberry controller. result: after three days no fibaro 
switches got stuck (So i think it's definitively an OZW interaction)

2) turned on yesterday, and removed all groups association to my fibaro 
switches (group1, group2 and group3 are empty now). For two days no problems 
(of course, without group association, the node status doesn't get notified to 
the controller if the in-wall switch is pressed). Node007 got stuck today 
(after two days operating). so I attach the log of relevant commands.

analyzing the log I can't figure out what is the cause. however I discovered 
that node007 got stuck at near at 16:35 when tried to turn on the light using 
the in-wall swtich. so I tryed to turn on from domoticz interface at 2014-11-15 
16:38:11.88, but of course the node was dead.

2014-11-15 16:38:17.437 Info, Node007, WARNING: ZW_SEND_DATA failed. No ACK 
received - device may be asleep.

2014-11-15 16:39:02.854 Error, Node007, ERROR: node presumed dead

So I powered off my house electrical line to unblock the switch, and OZW 
discovered the node come back online:

2014-11-15 16:40:11.464 Error, Node007, WARNING: node revived

Attached the full log.

Kind Regards

Original comment by ugo.v...@gmail.com on 15 Nov 2014 at 5:27

Attachments:

GoogleCodeExporter commented 9 years ago
Adding a note to this bug report:

They are 2 weeks I'm not experiencing stuck problem anymore with Fibaro 
switches.

the solution of that problems is matched when I added these lines to 
options.xml:

  <Option name="EnableSIS" value="false" />
  <Option name="Associate" value="false" />

EnableSIS should not have consequences about these problems.

But Associate=false I suppose may have helped. I have not changed any other 
options (just updated regularly Domoticz software during these 2 weeks). I saw 
that leaving Associate=true, at every OpenZWave startup, Domoticz generate a 
very high number of requests to Fibaro devices (reading and associating 
groups), and "I suppose", that make my flaky Fibaro devices stuck.

I'll post future updates in the next weeks.

Kind Regards

Original comment by ugo.v...@gmail.com on 14 Dec 2014 at 10:53

GoogleCodeExporter commented 9 years ago
it really appears to be a Fibaro bug. As CaptMidnight posted previously here, 
he reported it as a bug to Fibaro, but it seems it still exists:

https://groups.google.com/forum/#!searchin/openzwave/fibaro$20midnight/openzwave
/obqXdPNGAR4/yF_JxlPGYrMJ

Not much we can do about it unfortunately. 

Original comment by jus...@dynam.ac on 18 Dec 2014 at 6:32

GoogleCodeExporter commented 9 years ago
Another info about that bug:

Running Network Heal (routing reorganization) make some Fibaro nodes dead 
immediately.
So disabling autoassociation and automatic network heal into Domoticz, make 
these flaky Fibaro modules stop to get stuck.

However, I seen "another bug": after some hours (I can't identify how much), 
Fibaro Switches stop to send status changes of relays to the controller when 
pressed the in wall switch (power resetting the fibaro modules, make the status 
change work again).

Can someone confirm that these bugs doesn't exist with the latest release of 
Fibaro modules (I have 1.4 release)?

Kind Regards

Original comment by ugo.v...@gmail.com on 28 Dec 2014 at 3:49

GoogleCodeExporter commented 9 years ago
New update:

I definitively confirm that this bug is related to how OpenZWave works and 
manage the Z-Wave Network.

These Fibaro Switches are surely crap and never should get completely stuck, 
but the high traffic of OZW commands generated during library startup make 
these modules crash (happen if  <Option name="Associate" value="true" />, 
because when "true", OZW on every startup query the node about configured 
groups).

So summarizing, I discovered the following scenario (using 
Domoticz+OZW1.3.965+RaZberry chip as software controller):

These Fibaro Switches get stuck when:

1) Doing every day a network heal (route reorganization)
2) Restarting some times in a short time the OZW controller

These Fibaro modules doesn't get stuck when:

- Stop Controller
- Power cycle home elettrical line to reset the Fibaro Switches
- Start Controller (and make sure that <Option name="Associate" value="false" 
/> and automatic network heal is DISABLED)
- Wait until OZW finish the network initialization (with my network is required 
near 15/30 minutes before I can send zwave commands)
- Power cycle home elettrical line to reset the Fibaro Switches
- If the controller is not restarted anymore, the system stay stable for many 
days.

Considerations:

I'm not a programmer, so I'm the last person that can suggest where change the 
code and the behavior of the OZW. OpenZWave is a great piece of code, but IMHO 
(as I said, I'm not a coder, so excuse me for my wrong vision) should be 
rewritten the behavior of the OZW manage the initialization.
I've read the http://openzwave.com/knowledge-base/slowresponseduringstartup of 
course, but IMHO doing forever and ever the same operations when the library is 
initialized, slow down incredibly the reliability of the zwave network (my 
network with only 30 nodes, require from 15 to 30 minutes before becoming 
reliable), generate too much zwave traffic and make some buggy zwave appliance 
crash... 

Implementing a sort of database cache (reusable from every startup) of what OZW 
store into memory during startup is a difficult task? (I'm talking about as 
profane, so excuse me).

I really appreciate the work made in OZW!

Kind Regards

Original comment by ugo.v...@gmail.com on 6 Jan 2015 at 2:53

GoogleCodeExporter commented 9 years ago
There have been reports that leaving any association group on some Fibaro 
devices empty will crash the device. It was reported to Fibaro, but no action 
taken as far as we know. I would confirm all association groups have at least 
Node 1 in them (consult the manuals - the OZW configs could be wrong!)

Disabling Associations isn't a great option. 

As for the startup, there is a cache (zwcfg_*.xml) that caches static info. The 
startup times you see would be related to the dynamic values, like the status 
of the switches etc. 

Original comment by jus...@dynam.ac on 6 Jan 2015 at 5:53

GoogleCodeExporter commented 9 years ago
Btw, I have a lot of Fibaro stuff, as do other users and we don't see this with 
OZW, so I can only assume it's your environment. 

Please move this discussion to the mailing list for now. 

Original comment by jus...@dynam.ac on 6 Jan 2015 at 5:55