calimero-project / calimero-core

Core library for KNX network access and management
Other
128 stars 65 forks source link

Hanging EventNotifier Threads #65

Closed Maw-Jung closed 6 years ago

Maw-Jung commented 6 years ago

The EventNotifier class introduces hanging threads on connection shutdown, when the ConnectionBase method fireConnectionClosed calls the listeners. Due the EventNotifier implementing one, it's connectionClosed method gets called, which should quit the EventNotifier Thread. The quit method therefore calls interrupt on the Thread and joins on it to wait for it's shutdown. If the EventNotifier, which runs in an endless loop, however isn't in the wait method due to no events in need of processing, the interrupt will only set a interrupt flag instead of throwing an InterruptedException (see reference). This will prevent breaking the loop, procession of the queued events and inenvitiabilly the shutdown of the EventNotifier Thread. Due to used references the garbage collector won't release ressources in the following.

Possible Fix:

Before entering the wait method on the events ArrayDeque an check for the interrupted flag in addition to the empty check could reduce the probability to fail.

Please let me know if further informations are needed or how i can help.

Greetings, Marcel

calimero-project commented 6 years ago

Thank you for your report! Why would it prevent breaking the loop? Having entered wait is not a prerequisite for detecting the interrupt.

Do you maybe have a concrete example for when that happened?

Maw-Jung commented 6 years ago

From my understandings entering wait is a prerequisite to detect an interrupt in the loop trough an InterruptedException, since it's a method designed to throw one if interrupt is invoked. Sure, entering wait is in general not a prerequisite to detect an interrupt, but in this loop of the calimero project it is, because it will only stop on catching an InterruptedException. I got a concrete case of this problem in a large scale application, maybe i can find the time to give a reproducing example. Thanks for the quick reply, please let me know if further information to my explanation are needed.

calimero-project commented 6 years ago

As I see it, your problem can be described by the following sequence (producing an execution equivalent to the event notifier):

Thread.currentThread().interrupt();
System.out.println("thread interrupted = " + Thread.currentThread().isInterrupted());
final Object lock = new Object();
try {
    while (true) synchronized (lock) { lock.wait(); }
}
catch (final InterruptedException e) {
    System.out.println(e);
}
System.out.println("thread exit");

Which will exit the thread, and is in accordance with the expected behavior of wait.

So yes, please provide a stack trace, snapshot, or example where this fails, so I can see the root cause.

Maw-Jung commented 6 years ago

You were right on the behavior of the interrupt flag, my bad. I had the time to dig deeper and found the root cause of the issue i saw. It's connected to the openhab issue https://github.com/openhab/openhab1-addons/issues/1068. We use a similar way of connecting to calimero: When the connection is lost, calimero fires a CloseEvent. Any attached NetworkLinkListener will be fired, in the current EventNotifier Thread. If the linkClosed method of such handles the disconnection, there is no problem. In this case the connection was also handled through that method call, which will keep the old EventNotifier Thread open and drive the application into the undefined state mentioned above. For reference, the issue can be solved by spawning the reconnect task in a new Thread, through a scheduled TimerTask for example. Thanks for your help, i'll close this issue since it was a wrong use of the library.

Greetings, Marcel

calimero-project commented 6 years ago

Good to know it worked out, thanks for the feedback. (The problem you describe is basically a form of thread hijacking, which should in general be avoided.)