Closed mherwege closed 7 years ago
I just tried to understand where your problem is but I do not get it. You are saying that your things get only half initialized and then the discovery somehow kicks in and kills the initializing threads?
Also your log file is very long, what is the relevant part? As far as I see in the end all things are initialized...
Could you please point me towards the specific problem?
@mherwege I see what puzzles you - and I agree, something must have gone terribly wrong. No matter if it's an issue in the binding or not, this must not happen!
Would you mind getting a thread dump the next time you see this? Maybe that gives us a clue what the 5 thing manager threads keep themselves entertained with...
@sjka I will when I see it happen again. I did try again today in Eclipse, with the latest updates included. I couldn't reproduce it anymore. Could it have anything to do with #3423 that it now works? I noticed another time when it happened a number of messages related to the XML files not being read completely yet. It didn't appear in the log I provided, but there was one logger that did not start properly. That could cause these messages not to appear. From the log:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
With my limited understanding of the code, I tought there was a lock somewhere that had methods waiting for this XML file loading to finish.
No, that is unrelated I'm pretty sure. The waiting for XML processing delays handler creation but would influence the whole binding.
I'm rather suspecting a dead-lock in the ThingManager. It is using locks internally to ensure that only a single call goes to one ThingHandler at any time in parallel. So I suspect that the niko home control binding somehow discovered a way how to dead-lock the worker threads of the ThingManager. I just want that confirmed before I rework it completely...
@mherwege As you are using a Karaf based solution (openHAB), you could be interested in the command dev:dump-create
(https://karaf.apache.org/manual/latest/#_dump).
Just a little update. I was on vacation, so not able to test any further. On my return I upgraded openHAB to the latest unstable version (#1008). I have done a few restarts and not seen the problem since. If I run into it again, I will get you a dump.
Thanks for the feedback - so let's close the issue. Feel free to re-open it, if you come across it again.
I have this issue with my own developed binding: nikohomecontrol. I am in doubt where to post, here or in the addons2 repository. This could be an issue in my binding, but I do think it might actually be in the framework code. Please have a look at it and if the problem is in the nikohomecontrol code, could you please give an indication what the issue could be? I will close it here in that case.
I noticed a few times when starting openHABon my RPi, not all my nikohomecontrol child things got initialized. This only appeared in the 2.1 release version. All child things are defined the same way and which ones got initialized was variable. I expect this points to a multi-threading timing issue. I was now able to reproduce it in the Eclipse development environment on my PC.
The full startup log when running in Eclipse with just the nikohomecontrol binding is this:
A few observations:
Again, this may very well be an issue in my binding programming. In that case, please give me a hint and close this issue. I do think that it might be an issue in the framework itself though as I suspect the framework should not give up on initializing child things if the discovery service looking for bridges is called.
As this is a timing issue, I typically get my live system to operate without issues after a few restarts. Still I want to solve the issue and avoid the problem in the first place.