dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 496 forks source link

Sensors continue to work but actuator control stops working spontaneously from time to time #3994

Closed tslivnik closed 3 years ago

tslivnik commented 3 years ago

Describe the bug

From time to time, deconz becomes unable to control actuators, while it continues to read from the sensors. For example, I have a SmartThings smart plug which is both a sensor (voltage, current, power) and actuator (on/off). The sensors continue updating, but control of the actuators on the same device (and all devices, e.g. Sonoff Zigbee relays) stops working. OpenHAB with the API key continues to show regular sensor updates. Restarting deconz makes no difference. Rebooting the system appears to cure the problem.

Steps to reproduce the behavior

1) Install Debian Linux, deconz and OpenHAB. 2) Add the various sensors and actuators to deconz, then to OpenHAB. 3) Run it for a while. 4) Eventually (after several days or maybe 1-2 weeks), actuators stop responding to commands, while sensors continue to report data.

Expected behavior

I would expect to be able to continue to control the actuators.

Screenshots

I have no useful screen shots, the server is running headless.

Environment

Version: 2.05.84 / 9/14/2020 Firmware: 26660700

deCONZ Logs

I ran deconz with all logging flags enabled and obtained an enormous log too big to upload, but happy to provide extracts or re-run it with limited debugging flags enabled and provide the whole such log if you tell me what flags to enable.

Additional context

Zigbee devices on the network:

tslivnik commented 3 years ago

So they are on version : RemovedURL Sorry: http://deconz.dresden-elektronik.de/deconz-firmware/beta/deCONZ_ConBeeII_0x26700700.bin.GCF

That's the beta im talking about :) Are you on that one?

check the logs for errorcodes as mentioned on the wiki.

I am on the beta you mentioned originally (266f) not the one in your updated post (2670) which only came out a few days ago.

I updated to 266f very recently (less than 3 weeks ago - I think 2 weeks ago), at the same time I upgraded to deconz 2.11.05, after I was advised by this forum to do so. It did not improve anything, indeed, the result of both those updates was to make actuators not work at all, rather than just stop working after a week. So 266f is all broken but 2670 is all working?

Grep does not find any of the Zigbee error codes mentioned in the Wiki (E1, E9, A7, D0) in the log.

I uploaded an extract of /var/log/messages on 11 April which pointed out at least one problem, though it seems to be a different one from the one that happens usually.

Mimiix commented 3 years ago

So 266f is all broken but 2670 is all working?

The latest beta (2670) contains a lot of improvements on serial issues. Perhaps it fixes yours, but i stand by my initial point: VB is not stable with USB stuff. A lot of users mention stabbility issues with it , comparable with what you are experiencing.

Grep does not find any of the Zigbee error codes mentioned in the Wiki (E1, E9, A7, D0) in the log.

Good.

I uploaded an extract of /var/log/messages on 11 April which pointed out at least one problem, though it seems to be a different one from the one that happens usually.

Which was you running on an very old version of deCONZ right? Make sure your at least on the latest stable.

For now(forward):

tslivnik commented 3 years ago

I can upgrade to 2670 firmware. An environment other than VirtualBox is not an option. All my other USB devices including the USB Z-Wave stick, work fine, so frankly, blaming VirtualBox for the issue isn't very impressive.

Mimiix commented 3 years ago

I can upgrade to 2670 firmware. An environment other than VirtualBox is not an option. All my other USB devices including the USB Z-Wave stick, work fine, so frankly, blaming VirtualBox for the issue isn't very impressive.

I am just trying to help and point out solutions and see what works, not blaming anything. But hey, what do i know.

Either way: Please follow what i've suggested. If you wish not to follow them,which is fine, i don't feel the need to keep this issue open.

tslivnik commented 3 years ago

Thanks.

I posted information about this bug 8 months ago. I followed up several times. Most if my posts were ignored. Some were not. Those that made suggestions, I followed - I upgraded deconz, I upgraded the firmware, I posted logs, etc. etc. The bug remains unfixed.

The platform I have is VirtualBox running on a server with redundant power supplies, redundant UPS's, ECC RAM and a redundant ZFS filesystem.

a) I do not have another platform available. I have an old Raspberry Pi, but it's not a platform reliable enough to run my building. b) I interested running the system which will run the whole building on dinky unreliable hardware. I'm not prepared to buy new hardware which is unsuitable for deployment just to try and see if that might work (when the issue I believe is with the software) nor am I prepared to buy and set up a whole extra redundant server just to see if that will work - or even to set up a whole extra such a server just to run my building automation even if I was sure that would work.

c) A virtual machine, which can run either in VirtualBox, or otherwise concurrently with VirtualBox, is the system this has to run on. Everything else runs in such a virtual machine just fine, including USB subsystems like a Z-Wave stick and its drivers.

senilio commented 3 years ago

You are not the first person to report issues with Virtualbox, and especially unstable USB passthrough. If this is an issue with Virtualbox or with Conbee, I will leave unsaid. But unless you are willing to put in the effort of ruling out known Virtualbox issues, this ticket no longer serves a purpose.

manup commented 3 years ago

A note on the 0x26700700 firmware: This version is more robust for VMs or Docker setups as it's not so picky about the host application starting and running quickly enough. I can't tell if that is the problem here though.

But the problem visible in https://github.com/dresden-elektronik/deconz-rest-plugin/issues/3994#issuecomment-817289611 is addressed in this version.

Can you also please share a screenshot of the VM Settings for the USB port of the ConBee.

TheNON75 commented 3 years ago

Hi @tslivnik ,

I am really sorry that so many time is spent already with the issue by you and the others, I have no doubts it has left some bad taste in the mouth, unnecessarily. Let’s try to move back to the right track, if you agree.

Your machine is very impressive, also your idea is superb and very welcome by myself too. Despite all of the above, I recommend to try one of the official methods for installing and running deconz in the first rounds. It can either work from the very first moment without issue or have the same/similar/other symptoms.

Nevertheless, these are called official methods with purpose…. And that is: these are tested, supported and so, logically the conclusion is: anything else is at your own risk and if possible, a very best effort is provided by the experts. (Filling with incompatible fuel my car is not supported too, but i can try at my own risk, right?) I believe this is something that has to be accepted and understood by everyone.

Anyhow, once it is confirmed it works, you may continue tinkering on any not officially supported combinations, knowing that it has to work, just there is something to sort out.

I would recommend the docker approach as in one hand it is a kind of virtualization, it provides you the capability to access the deconz GUI also on headless, minimal systems with vnc.

Does that sound reasonable and feasible by you?

tslivnik commented 3 years ago

I installed the 2670 firmware. After reboot, the actuators started working and worked for 24 hours, then stopped working. It's an improvement on 266f firmware, which didn't work at all, but it's still a regression from my original firmware, which worked for approximately a week at a time.

I will look at a Docker installation, however, I can't do that in the short term for several reasons:

1) I have my entire setup with over 100 devices (only some of them Zigbee) operating the building at present. Everything else works fine, including other USB devices, only deconz/ConBee II/Zigbee is broken. Migrating may involve a lot of work and could break things.

2) I don't have enough knowledge and experience with Docker yet including with how to virtualize things like USB hardware, how secure it is etc. to deploy it to run the building. I will try to educate myself over time.

TheNON75 commented 3 years ago

Docker is a very easy approach and is in use by many enterprises. Backing up the device database with the phoscon app and restoring it elsewhere is a well working method (you can avoid adding again everything etc)

please note, there is a one day old new stable release and also a new stable firmware.

TheNON75 commented 3 years ago

Hi @tslivnik, do you need any help or is everything on the track?

tslivnik commented 3 years ago

Thank you. Currently, the Zigbee sensors are always working and the Zigbee actuators are never working. I.e. reflashing the firmware and upgrading deconz made things worse - from the actuators working for about a week at a time before stopping working to never working at all. I don't have the time currently to fully investigate using Docker, though my initial investigation is that this won't be a good solution in my case. I will also look into KVM and Xen, but neither of those is a short term project.

TheNON75 commented 3 years ago

Well, inak really sorry the things are not getting better in your environment. As I understand you are still on a vm other than docker. Don’t get me pushy, but I really recommend to try one of the official ways. feel free to join our discord to get some more interactive help on firing up a dedicated docker container for deconz. It takes really few minutes only.

tslivnik commented 3 years ago

Sorry, I'm using OpenHAB to run a building, I'm not tinkering with this as a hobby and I don't have a lot of time to tinker with it. I am looking at Docker, Xen and KVM with the appropriate level of priority (low - I don't have the time, the VirtualBox setup works for all purposes other than deconz, it's doing useful things and trying to replace the entire system would cause disruption and take time which I don't have). On what I have seen so far, Docker will not meet my requirements, but Xen or KVM may, so I could try to switch to Xen or KVM as the virtualization platform on this server - but again this won't happen soon because, a) the server runs a lot of VMs, and it's not clear that migrating to a new platform would not break things, or would work better, b) it would require me to migrate to a new system, which means downtime, disruption, and c) it will take up a lot of my time for not a lot of benefit. For now, since I'm not using a lot of Zigbee actuators and the Z-Wave USB stick is working fine, the simplest thing for me to do is to replace the ZigBee actuators with Z-Wave ones. I'd like to get Zigbee to work, but for now, my conclusion is that deconz does not work, at least not in my environment.

TheNON75 commented 3 years ago

I fully understand that you don’t want to harm your production environment, I wouldn’t do that either. My suggestion was only about deconz. VirtualBox is confirmed to be good for many purposes, but also to have issues in certain cases.

Nevertheless, if you have a spare machine for a test environment for that purpose, it could also help in the investigations, as you said you are now switching to zwave, it could not be a big issue to miss zigbee devices ;)

Apart from the above, it is fully your decision how you would or wouldn’t like to proceed. “Things” can work this or that way without issues even if officially not stated as supported, but getting help on resolution in such a case is always very difficult. Here I am not talking only about deconz, but in general.

Should you decide to continue the investigations, we are here (and on discord) to help. Please note that unless you continue keeping this topic open, it will be automatically closed after certain days or you/we can close it manually even there is no solution found. Just let us know.

regards THN

tslivnik commented 3 years ago

Thanks. Currently, I don't have a suitable spare computer - the only spare Linux machines I have are a couple of Raspberry Pi's, which themselves crash all the time. Maybe it's the power supply, although I've replaced the power supply with an "original" one twice already. I think it's just dinky consumer-grade hardware. But then it's cheap. All other available machines would only be able to run a Linux distribution in a VM. Also, I am using the Zigbee stick for the sensors which continue to work fine using deconz; only the Zigbee actuators do not work. I'd have to get another ConBee II stick. Which I can easily do, but at this point, getting another brand of Zigbee stick and trying to use the OpenHAB Zigbee binding seems like a better investment of time.

So in other words, I will continue to look at using Xen, KVM and maybe even Docker (though, like I said, from what I have gathered so far, I don't think Docker will work for me) at some point in the future. I suspect Xen may well prove to be a better solution all round than VirtualBox, for other reasons too. But I don't have the time to do any of this in the short term. In the short term, other solutions seem to be more promising and/or a better investment of time.

I will continue to monitor deconz, because if the bug is fixed and a simple "apt install" to upgrade to a new version can get my actuators to work, in the immediate future, that would be the simplest solution of all for me.

TheNON75 commented 3 years ago

Well, to sum it up:

In case you agree, I am recommending to close the issue and if you can manage your time and technical possibilities to give a try to the official methods (fixing one of your rpi or else), please return to us and let us pick up the topic together again in a new thread (starting with a “blank sheet”).

I hope this sounds also reasonable and acceptable to you.

We look forward to hearing from you soon

@Mimiix, may you please close the thread?

tslivnik commented 3 years ago

Ok, to conclude on my side: deconz does not work in VirtualBox and possibly other environments. There is no interest on the part of the deconz team to fix this issue. I will assume it will not be fixed and will use some other solution. Thanks.

Mimiix commented 3 years ago

@TheNON75 Seems fair to me.

@tslivnik I think thats not a fair approach. Several people tried to help you and figure out what the issue is. Part of that is determining where the issue is caused by. Is it Vbox? Is it a bug in deCONZ? Nobody knows. Nothing can be fixed if stuff isn't tried. Your issue is odd, as there is no difference in devices made by platform. Sensors without (good/enough) Routers often result in sensors dropping/malfunctioning. Either way: we couldn't determine that. My request on if it are ZBminis is totally ignored aswell (https://github.com/dresden-elektronik/deconz-rest-plugin/issues/3994#issuecomment-901006192). This would make sense as they seem to have a dodgy quallity.

So to conclude here: There might be a bug in deconz or a bug in VBox. What it is? Nobody knows. The reporter was providing information but was unwilling/unable to try other platforms or provide enough information.

I close the issue, as it doesn't seem to go anywere.

Afterwards, as Github policies changed (please read #5113), this are the options:

Other than that, there's not much to do and i'm sorry it isn't the result your after.

Closing this issue.

tslivnik commented 3 years ago

Thanks for attacking me personally for reporting this issue 8 months ago. I was ignored for months at a time. This is simply a fact: check the timeline. I was told to update the firmware. Which I did. I was told to upgrade deconz. Which I did. Between them, these two things made the problem much worse. I was told to run the program with logging options. Which I did. I provided a copy of the relevant logs. Which nobody commented on. In between me providing the information, I was ignored for months at a time, chasing and following up several times each time in order to get a response at all.

Only after about 8 months did someone say "VirtualBox is the problem". I stated very clearly back in December 2020 when I filed this bug that my system ran in VirtualBox.

Your suggestion, to try to run this in a different environment, is not feasible on my end, as I have explained sufficiently. It's a wild goose chase, and moreover one which would require an inordinate investment in time, disruption and possibly buying new equipment - without any guarantee that it would help anything. And with an absolute guarantee that it would disrupt things.

But you accuse me of being unfair and you accuse me of ignoring people. Thank you. It is genuinely helpful to know what sort of response one can expect from deconz developers when reporting a bug. This knowledge will no doubt save me a lot of time in the future.

Nobody knows where the bug is, that seems to be true, I certainly don't. But what is clear is that a) other USB devices and drivers in the VirtualBox VM work fine, b) deconz works fine with my sensors, c) "upgrading" deconz and the firmware changed the behaviour - made it worse, from actuators working for a week at a time, to not working at all. Nothing changed with VirtualBox in the meantime. So while there may be all kinds of bugs in and issues with VirtualBox (I am sure there are plenty), there are, I would suggest, also clearly some bugs in deconz. But you're not interested. And now nor am I.

Mimiix commented 3 years ago

To be fair: concluding there's a bug without any pointers isn't really bug reporting. You are stating something is going wrong in your environment, but we can't recreate it anywhere else. Until that is possible, i can't say its a bug in deconz or just simply some rare combination of devices (like yours for example).

I feel very sorry that you see it this way, as this is not meant personal in any way. Github is community support and not any way of official support. Nobody is obligated to reply to issues, we are all volunteers and community members. I help out when i can, where i can. If i don't have a idea or solution, i simply don't reply. I can only ask devs to check issues. They reply and do the best they can. I am not accusing you of anything, just stating the facts. And as i said, you did provide info but nothing we can do anything with it.

We are a community and help where we can. You can try dresden support (by email) if you prefer.

tslivnik commented 3 years ago

Bug reporting is reporting a bug, which I have done, together with all the information I had or was able to obtain and questions about how to obtain more information to help provide developers the information needed to pinpoint the bug. That's all I can do, and all a user in my situation can be expected to do. Bug fixing, on the other hand, is running tests, pinpointing and localizing the issue, etc. Telling the user to bend over backwards to completely change his setup or blaming the problem on something else (like VirtualBox) isn't bug fixing, or trying to find or fix a bug. Whether or not the software works in some other setup is neither here nor there, it clearly doesn't work in my setup. The behaviour is clearly changing between different versions of deconz and/or the firmware, without any change to VirtualBox or the rest of my environment. Initially, the only problem was that Zigbee actuators stopped working about once a week, with the drivers working most of the time. Now, the Zigbee actuators don't work at all. All that has changed in the meantime is that I upgraded the firmware and deconz. This suggests, nay, proves, that the problem is at least in part (if not in whole) with deconz and ConBee II, not with VirtualBox or with the rest of my setup. If it was (only) a VirtualBox issue, behaviour would not have changed and got worse as I upgraded deconz and the firmware.

Accusing me of being unfair because I didn't immediately leap through a bunch of very expensive hoops (porting the entire setup to a completely different environment, or a different computer, possibly disrupting other things - which is impractical, very time consuming, disruptive and which I have no reason to believe will improve anything) even though I was planning to try a different virtualization solution if and when time permitted, is certainly gratuitously personal and also unwarranted.

I do understand that deconz is open source software, not supported, and free of (financial) charge. However, my time and the lack of functioning of my solution are also costs, and while nobody is obligated to spend time fixing the bugs which cause the software not to work in my environment or to give me support, I also can't use the software if it doesn't work (for me) and if no one fixes bugs 8 months after I have brought them to the developers' attention. Deconz is the only software like that, but it certainly is software like that, it would appear. I'm just trying to get the system to work, and replacing my Zigbee actuators with Z-Wave ones right now is the simplest and least expensive way I can get this done (in terms of not just monetary cost but also the cost in time, effort etc.) if not in fact the only way I can get this done.

Mimiix commented 3 years ago

I think we are going in circles. The options are on the table and further options are described.

I remain from replying here from now.

tslivnik commented 3 years ago

Not really, but I've said all I had to say. It's been an insightful experience, which I won't repeat, thank you.

TheNON75 commented 3 years ago

Hi @tslivnik

Frankly speaking I am a little bit sad the things went this way, full with emotions. Especially as you've done really a lot to get over this situation and perhaps you have a different view, I can assure you, so did we. The long run of the story is an unfortunate situation, for which I am personally very sorry about, but cannot change that and I do not want to analyze why it happened as it did.

Just like in case of any other "things" there are official ways by the manufacturers. For example you can fill your
gas car with diesel as the range is known to be larger, but then you don't complain at the manufacturer or the petrol station if things turn wrong with the engine. Also if something is running on windows 10 officially, it may or may not run on windows xp and vica versa. No one can support you in such circumstances as the situations is outside of the official "fields".

In case of deconz there is a list of officially supported environments, which doesn't mean it is not working with others also not it does, it is all up to the user at it's own risk and so is excluded from a "good way" of support, therefore no or very limited support can be provided in those cases. That is the reason we were pushing towards to one of the standard approaches and docker is the less "painful".

I believe this is something that can be understood and accepted also by you.

The docker approach doesn't require you to drop or migrate anything else than deconz. Also, you do not need a new hardware if you don't want to have. It is free. It only needs to restore your backup in phoscon. The full crash course takes few minutes and for you with your skills, maybe even less.

As you have currently no time for this and the members cannot proceed in the current circumstances with providing the adequate level and quality support towards to you, the best what we all can do is to keep the issue closed until you can return to us for giving a new try.

Does this sound viable to you?

tslivnik commented 3 years ago

Thanks, that's more reasonable.

I've looked more at Docker, and Docker will not work for me, even if it were to solve this problem (which I'm doubtful about). I don't have a lot of knowledge and experience of Docker, but I do know the basics, how to configure a Docker container etc. and I do use Docker for other purposes on other computers. Unfortunately, that's quite a different paradigm and/or model from a virtual machine, and comes with its own limitations, different level of isolation, security, different way of storing its data, etc. so for a variety of other reasons, I will not be installing Docker on this server.

Where is the list of supported environments, please? I will have a look now and will bear this in mind in the future. E.g. is Xen or KVM supported? A reliable redundant hardware platform is important for me, which also means it will have to be virtualized as it doesn't make sense for me at the present time to buy a whole redundant server just to run OpenHAB.

TheNON75 commented 3 years ago

Sure, I can fully understand your view.

Here is the list of supported platforms (a few scrolls down scroll down) installation

About kvm and xen I cannot add a word, it is outside of my knowledge. However as being said, it may or may not work ;) . Nevertheless, as discussed some posts above, these questions you may raise on the forum as GitHub issues are for a little different purpose. forum

Perhaps someone has done some successful experiments already and can share the outcomes with you. This is a good thing in the forum, there are many users helping each other's too.

You may join the Deconz discord server too, there are lots of dedicated channels, but there are possibilities for deconz independent chats too.

I would be glad to continue this interesting discussion but as you can see too, we are already very off-topic here, in this closed issue. If you don't mind, I wish you a good luck and further great evolution with your solution.

Hope to hear you soon on one of the appropriate channels

tslivnik commented 3 years ago

Nothing on that page listing the supported platforms suggests that running Linux/amd64 in VirtualBox will not work or that it is unsupported.

TheNON75 commented 3 years ago

Supported platforms are those what are listed. It cannot be expected that all permutations for non supported are also mentioned.

As said, we are far beyond the topic here already. My apologies, but please open a thread on the forum or come to the deconz' discord server for further discussions on user questions and thoughts as advised earlier. Alternatively you may directly contact the manufacturer in email as well.

@Mimiix, I am kindly asking you to lock this thread to ensure we keep the original purpose of the GitHub issues

tslivnik commented 3 years ago

Ubuntu is listed as a supported platform. It does not say that it has to be run on any specific hardware. Just to be clear, as stated, Ubuntu running on a VirtualBox (or indeed any hardware or virtualization platform) is listed as supported. You might want to fix that, or not tell people off for not reading your mind.

Mimiix commented 3 years ago

@TheNON75 Noted.

Locking this issue.