eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
862 stars 786 forks source link

[Tradfri] Connection gets lost on reboot of OpenHAB server #6748

Open Tobster77 opened 5 years ago

Tobster77 commented 5 years ago

The issue described here https://github.com/eclipse/smarthome/pull/6193 was discussed and a solution proposed. Still, it does not work reliably.

There is a discussion in the OpenHAB forums here: https://community.openhab.org/t/tradfri-binding-loses-connection-after-power-failure-gateway-reboot/47833/61

Expected Behavior

When the OpenHAB server boots, it should establish a contact to the Tradfri gateway. This connection should be stable.

Current Behavior

The connection is not reliably established after boot. One has to re-enter the credentials to get the binding online again.

Possible Solution

Try to re-establish the connection if it gets lost (and keep the credentials)

Steps to Reproduce (for bugs)

  1. Install the Tradfri binding
  2. Enter credentials. Gateway comes online.
  3. Wait. Reboot. Gateway does not come online at 2 out of 10 reboots.
  4. Enter credentials again. Gateway comes online

Context

I run a Tradfri gateway to switch some bulbs.

Your Environment

olemr commented 5 years ago

I am experiencing the same trouble for many months. I'm on OH 2.5.0 SNAPSHOTS running on x64 Ubuntu Server 16.04.0 LTS. More info here.

cweitkamp commented 5 years ago

Thanks for reporting this issue. It is already known.

Duplicate of #6065.

olemr commented 5 years ago

Is it an exact duplicate? I only have connection problems after a OH2 restart or PC reboot. When I get it Online after a GW reboot or bundle restart it stays Online; forever. I did have dropout issues in the beginning, but they went away after replacing the USB power and Eth patch cable that came with the GW.

Tobster77 commented 5 years ago

@cweitkamp: Thanks; indeed the issue is discussed here and in the community forums at several points and several times it was assumed the issue is resolved (e.g. https://github.com/eclipse/smarthome/pull/6193) . My last understanding was that an update of ESH to version 0.11 should solve the issue, but I tested this version as part of OH 2.4 final release without success.

Thus, I opened this topic. In case you can open https://github.com/eclipse/smarthome/pull/6193 again, I can close here.

What is the relation between https://github.com/eclipse/smarthome/pull/6193 and https://github.com/eclipse/smarthome/issues/6065? I didn't come across the latter, but are they also duplicates of the same problem?

However, if there is something I can do or test, please let me know.

cweitkamp commented 5 years ago

@olemr Yes, I am pretty sure that both issues have the same cause.

@Tobster77 #6193 was a pull request. It cannot be reopened.

Unfortunately nobody has a clue what causes this problem. Currently it is still present. I observe it in my environment too. Every time - no, not every time, but most of the times - I am rebooting my system. I such case I have to reboot the TRADFRI gateway afterwards.

chiefymuc commented 5 years ago

@cweitkamp What I usually do after an openHAB reboot is to go into karaf console and restart the tradfri binding, which fixes the problem (actually, sometimes up to 3 restarts of the binding are necessary). So it is possible to do it without the tradfri hub restart. Maybe the tradfri binding can be changed to do multiple connection attempts?

Tobster77 commented 5 years ago

Here https://github.com/eclipse/smarthome/issues/6065#issuecomment-453748486 a fix ist proposed and the author asks for testing. As I am unexperienced in conpliling ESH: Maybe an advanced user can do a test and confirm, so the new code can be merged? Thanks :-)

hreichert commented 5 years ago

IMHO #6065 is a different case (gateway reboot, bridge and things go offline and never come back) than this issue (server reboot, bridge sometimes never comes online)

boaks commented 5 years ago

The term "reboot of OpenHAB server" could be explained more specific.

"Cold reboot", really reboots the complete software stack (maybe even the hardware)? "Warm Reboot", just reboot some software modules.

For the "cold reboot", scandiums connection store will be empty and so the full dtls handshake should be executed, as expected. That issue is not related to the fix in https://github.com/eclipse/smarthome/issues/6065#issuecomment-453748486 .

For the "warm reboot", scandium depends on the specific functions called to do that. If that's your issue, maybe you can describe, which functions in scandium are called to do such a "warm reboot".

Tobster77 commented 5 years ago

@boaks, @hreichert: Not sure if we have a clear status of the issue: Recently, both of you contributed to issue https://github.com/eclipse/smarthome/issues/6065 - I can not make an estimate if this would help here as well:

Indeed, in the issue reported here, the server reboot, or OH service restart causes problems: Both might lead to the situation of the gateway not coming online, and only a server reboot resolves the situation.

So if the Scandium/Californium and binding updates from #6065 are done, I propose to check if this also resolves the issue here.

As @kaikreuzer asked for a PR for the binding modifications:

Thanks! @hreichert Please create the PR against https://github.com/openhab/openhab2-addons/tree/master/addons/binding/org.openhab.binding.tradfri (see here for the reasons).

@hreichert, maybe you can reference to both #6065 and this one, #6748?

hreichert commented 5 years ago

@Tobster77 From my tests, the fixes from #6065 do NOT fix this problem.

I can observe that sometimes after a server reboot ("cold reboot") or a OH service restart (technically also "cold reboot") no connection is established.

@boaks To further clarify the terms:

boaks commented 5 years ago

Hm, that smells for "race-conditions" during the start-up ... But I have currently no idea, what fails ...

Any logs? Any captures?

boaks commented 5 years ago

From the last comment in the OpenHAB discussion:

https://community.openhab.org/t/tradfri-binding-loses-connection-after-power-failure-gateway-reboot/47833/65

Tradfri gateway doesn't have an IP address

That’s exactly what I am also experiencing…

And

2018-12-30 15:31:44.086 [WARN ] [iscovery.TradfriDiscoveryParticipant] - Discovered Tradfri gateway doesn't have an IP address: [ServiceInfoImpl@1501157832 name: 'gw-b072bfb31baf._coap._udp.local.' address: '(null):5684' status: 'NO DNS state: probing 1 task: null', has NO data, empty]

Without a destination-address californium/scandium will not be able to connect to the gateway. My feeling is, this is outside of californium/scandium.

Tobster77 commented 5 years ago

Dear all,

I must admit I lost the overview in the last week. I am on OH 2.5 M1, and the issue still remains. Do you see any chance to continue the debugging of the the binding? Or are we lost?

J-N-K commented 5 years ago

I never experienced this issue. Has https://github.com/eclipse/smarthome/issues/6748#issuecomment-451247830 been tried?

Tobster77 commented 5 years ago

I never experienced this issue. Has #6748 (comment) been tried?

Basically, I reboot the OH server a couple of times which is equivalent and usually fixes the issue for a couple of days. But this is not really a solution.

I am not programmer - but wouldn't frequent checks of the Tradfri gateway and automatic binding restarts be a workaround?

Tobster77 commented 4 years ago

The issue is remaining. Can I do something?