eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
865 stars 782 forks source link

Hue addon not switching all lights in big groups #4794

Closed TheNetStriker closed 6 years ago

TheNetStriker commented 6 years ago

Since the latest Hue firmware update (version 01038802) the Hue api seams to be blocking rapid http requests. This happens e.g. when switching on/off lights using a group with a lot of lights. (In my case from 7 lights up) This results in several lights that do not react to the command from OpenHab.

I've reproduced this problem simply by using the Chrome ARC plugin and uploaded a video showing the problem here.

The first commands are working but when I try to send multiple request in short time the requests get blocked by the api.

I also tested what happens if I control the lights individually in a rule and add a Thread:sleep(100) between the requests and this seams to solve the problem.

Is there a way to increase the concurrent requests on the Hue bridge or could you implement a setting to limit the concurrent requests to the hue bridge in the addon?

kaikreuzer commented 6 years ago

Thanks for the analysis @TheNetStriker! This indeed sounds as if some throttling has to be implemented in the hue binding - we already have something similar for TRADFRI afaik.

TheNetStriker commented 6 years ago

Hi Kai, as I just found out the problem in my video seams to be that the button turns into an abort button and if it is fast clicked the request gets aborted by the button. But I still have the problem that not all my Hue lights react when I switch a huge group. Can I provide you more information about the problem somehow?

kaikreuzer commented 6 years ago

Can I provide you more information about the problem somehow?

Well, you should make sure that your issue report contains all required information that is necessary to easily reproduce a problem. Once any developer wants to look into it (which won't be me any time soon, sorry), he might come back with further questions.

TheNetStriker commented 6 years ago

Hi Kai, I now analyzed this using tcpdump to see which http packets get sent to the bridge using this command:

tcpdump -i eth0 host 172.17.4.13 and port 80 -n -s 0 -vvv -w /var/log/openhab2/http.pcap

After that I analyzed the result on https://packettotal.com and I did come to the following conclusion:

So the problem seams to be that the addon does not send a request for all the lights in this big group. Maybe it has something to do that the first request to the light 19 did not seam to return an http result. Please let me know if I can provide additional information about the problem.

Edit: I also tested switching off the 22 lights using curl and had no problems. Every light did properly switch off. So this has to be some kind of problem with the addon.

TheNetStriker commented 6 years ago

Hi Kai, I did some more tests and I can confirm that a simple delay when sending the commands help solving the problem. I changed the setLightState method of the HueBridge class like this:

    public void setLightState(Light light, StateUpdate update) throws IOException, ApiException {
        synchronized (http) {
            requireAuthentication();

            String body = update.toJson();
            Result result = http.put(getRelativeURL("lights/" + enc(light.getId()) + "/state"), body);
            handleErrors(result);
            try {
                Thread.sleep(25);
            } catch (InterruptedException e) {
            }
        }
    }

This is a very simple fix and could be made better for sure, but it helps solving the problem for the moment. Maybe also the delay could be even smaller. This fix also helped that the lights remain in online state in paper ui. (The "connection reset by peer" error that occured previously did set the lamps to offline)

Edit: I found an official documentation from Philipps how the delays should be set here.

TheNetStriker commented 6 years ago

I just got the confimation in the forum from another user that my workarround with the delay solved the problem on his installation too: https://community.openhab.org/t/hue-bridge-and-lamps-offline-after-oh-2-2-update/37556/17

tfelix commented 6 years ago

On the first glance I have a similar/same problem. I have 9 lamps in a group for my livingroom: 4 Tradfri 2 Hue Color Spots 1 Hue E27 Color 2 LED Strips Controller from Dresden Electronics all connected to a V1 Hue Bridge. When switching the group not always but sometimes I get a:

2018-04-05 23:36:34.974 [hingStatusInfoChangedEvent] - 'hue:bridge:001788170b48' changed from OFFLINE (COMMUNICATION_ERROR): Connection reset by peer (connect failed) to OFFLINE (COMMUNICATION_ERROR): Connection reset

In the logs and not all lamps switch. Sometimes 2 or 3 stay on and I need to re-issue the command. Did not have time too try out the patch but I will maybe on the weekend find the time.

lolodomo commented 6 years ago

Was this fix finally implemented ?

kaikreuzer commented 6 years ago

I didn't see any according PR so far - did you?

lolodomo commented 6 years ago

Obviously not in this issue. Do we create a PR now or was there a reason to not do it ? PS: I have not enough devices to validate this fix.

TheNetStriker commented 6 years ago

Please note that my fix was just a quick fix. It works most of the time, but sometimes the problem still occurs. This should be fixed with a more elegant solution. Maybe with some kind of job queue that can limit the calls per second.

Maybe this RateLimiter would be useful to limit the calls per second: https://github.com/google/guava/blob/master/guava/src/com/google/common/util/concurrent/RateLimiter.java

I could try to implement this and create a pull request if it is okay to implement google libs. Or does someone have a better idea?

kaikreuzer commented 6 years ago

We are trying to rather phase Guava out right now, but it would be fine to add a similar implementation somewhere to smarthome.core for general use (similar to the caching implementation).

TheNetStriker commented 6 years ago

Hi Kai, you mentioned a similar implementation on the Tradfri binding. Do you mean the"commandsQueue" in this file?

Maybe this could also be used for the Hue addon. But I don't exactly now how this is working yet.

kaikreuzer commented 6 years ago

Yes, that's exactly the place in Tradfri. It isn't really much logic, so you could simply do it similarly for hue. Alternatively, extracting it as a reusable class into smarthome.core could make sense as well (might be slightly more discussion involved to make sure it ends up as a proper API).

TheNetStriker commented 6 years ago

I did take a closer look at the Tradfri code and I think I now see how this works. This system also allows for different delays for every command. This is exactly what the Hue addon needs, because if more parameters are set on the lamps a higher delay is required. I will see if I find some time to implement this into the Hue addon.

TheNetStriker commented 6 years ago

Hi Kai, I started a first try to implement this on the Hue Addon. Currently it is not as elegant as I want it to be. The problem is that in the current code exceptions where handled in different classes. When sending the commands async this has to be handled in some other way. I also only implemented this on the setLightState method as a first step. Later this should be implemented on all methods that send http requests.

Edit: Regarding the documentation of Philipps only the commands need to be limited, so maybe it is enough to only change the setLightState to an async method.

I've uploaded my code changes to my repository: Link

Maybe you could take a look at the code and suggest a better approach?

omatzyo commented 6 years ago

Hi all, has there been any progress on this? As I’m sure you all know, the hue binding is one of the least reliable these days, even with the great quick fix that TheNetStriker added.

I certainly appreciate your time! Let me know if I can help in any way. Is there a version I can test?

TheNetStriker commented 6 years ago

I'am currently testing my code changes here: https://github.com/TheNetStriker/smarthome/commit/faf3ee1f0335e5cbff0ea585bca0f55b1578e032

The lamps do not switch as fast as before, but they always switch without exceptions. The binding should now automatically choose the right delay for the amount of commands that are being sent.

If you want to test my changes I've just uploaded the modified addon to my Dropbox: https://www.dropbox.com/s/v9q37dmxo6q4kqh/org.eclipse.smarthome.binding.hue-0.10.0-SNAPSHOT.jar?dl=0

tfelix commented 6 years ago

I just installed the Addon from the dropbox and tested it with a total of 19 lamps (Hue, Tradfri and Dresden Elektroniks) and 1 Mains Switch from Osram. Currently it works flawlessly I report in a few days if I encountered any problems.

tfelix commented 6 years ago

I want to give feedback that I can now switch perfectly fine all lamps in my apartment. Even switching in groups works flawless now every time (they MIGHT switch slower but I dont think its really noticable since prior at least 2 lamps stayed on most of the time which is definitaly to slow :D). So thank you for this fix. It made my installation usable.

kaikreuzer commented 6 years ago

So will you create a PR, @TheNetStriker?

TheNetStriker commented 6 years ago

I've just created a pull request, but for some reason it did not pass the Eclipse validation. Are additional steps required in this repository?

TheNetStriker commented 6 years ago

I've tried to re-commit my changes using git commit -s, but the errors are still showing. Do I have to submit this in a completeley new pull request? (I did a reset of my commit and commited again with git "commit -am -s" and then pushed with "git push -f")

tfelix commented 6 years ago

Looks like their CI build failed because of some timeout? I am not sure if its the vault of your PR but I guess someone from the team has to look into it. But anyways thank you for work @TheNetStriker. Hope it gets merged soon so my lamps do work with the official binding :dancer:

TheNetStriker commented 6 years ago

Good news, my changes were just merged in the pull request.

lolodomo commented 6 years ago

So this issue can now be closed ?

TheNetStriker commented 6 years ago

Yes this can be closed.