edvin / tornadofx

Lightweight JavaFX Framework for Kotlin
Apache License 2.0
3.68k stars 272 forks source link

ConcurrentModificationException from events that undock an event subscriber #997

Open ppslim opened 5 years ago

ppslim commented 5 years ago

Currently running 1.7.17 (for reasons...).

Trying to boil down my application to the simplest components for this.

I have a central FXEvent called GlobalEvent, which is per the name, very central and in this case, intended to be fired for a lot of reasons and in a lot of places. One example from my real app is it gets fired when some pretty complex Authorization rules may have changed, so all docked / alive subscribers need to know and react accordingly.

For my simplified sample, I have a view containing a BorderPane, the center of which may dock one of two fragments. FragmentA is the default fragment, and FragmentB is an alternative.

The view subscribed to the GlobalEvent and based on a boolean property in a controller, will replace the center with either A or B (this boolean is more complex in my real app).

FragmentA does not subscribe to any events, but FragmentB does subscribe to GlobalEvent. In the example, the subscription is an empty lambda, but that is all it takes.

There are no issues when FragmentA is docked in the view.

However, if FragmentB is docked, GlobalEvent is fired and the subscription is view swaps the docked fragment to A (due to the boolean changing), this will result in ConcurrentModificationException.

Per the EventBus docs, View / Fragment components only active when the component is docked. Naturally, if FragmentB is docked, its subscription is active. If the subscription is fired and this results in the view changing the docked fragment from B to A, B will be rendered inactive during the event execution cycle.

I believe this unsubscription is not contained properly leading to modification of the subscription set and thus the exception

See the following Gist for example code. https://gist.github.com/ppslim/c428bcbf6b38dbe8e2c321257a793306

Amusingly, there are some variations I have attempted that do not trigger the problem and the possible cause above suddenly make absolutely no sense.

e.g. if I add a button to FragmentB, thus replacing the init as follows.

init {
    with (root) {
        label(title)
        button("Toggle Domain from FragmentB") {
            action {
                ctrl.domainRequired = false
                fire(GlobalEvent)
            }
        }
    }
    subscribe<GlobalEvent> { }
}

This does not trigger the problem, ever.

edvin commented 5 years ago

This code is not throwing an exception with the current master branch, so it seems the issue is already fixed.

ppslim commented 5 years ago

Having looked deeper, I am not sure how it would be resolved.

On review of how the fire call works and the threading/async applied, the intermittent element looks to be linked to timing issues.

I need to get my IDE up and running to test further (including against a snapshot) to explore it further, but I suspect threading is causing some slight latency .

Components callOnUndock calling detachLocalEventBusListeners is not thread safe in the, as it could well call unsubscribe during or after the forEach loop in EventBus.fire.

i.e. it is always calling the subscription on ApplicationView as this replaces the view, but then depending on if callOnUndock is called before the FragmentB subscription, or the other way controls if the exception will occur or not.

ppslim commented 5 years ago

So on further review, my theory is most certainly the likely root cause and remains present even in 1.7.20-SNAPSHOT

However, it also means the EventBus subscriptions mechanism, or detachLocalEventBusListeners, is likely to require some major tlc to solve it.

I need to trace the specifics in a debugger still, but the following is most certainly occurring.

If the foreach loop in fire for the first GlobalEvent call completes at "Checkpoint A", then no exception occurs. Additionally, this also mean the subscription in FragmentB will have also been invoked and completed.

However, if the foreach loop in fire for GlobalEvent completes at "Checkpoint B", then modification of the subscriptions HashSet for GlobalEvent will trigger the exception, as the for loop is still in progress during this modification.

The timing issue is likely a tied up thread pool allowing, or some other latency inducing activity, delaying the execution of detachLocalEventBusListeners.

This means this is a larger issue and also exposes that EventBus is incapable of nested firing. Nested firing was one possible thing I was going to do, as the GlobalEvent could well have both directly and indirectly tried to trigger a nested GlobalEvent.

In my case, a nested global event would need to be made safe so not to overflow the stack due to unhandled recursion, but generally, wider invocation of the event bus from other listeners such as un/dock could also trigger it.

This also makes it difficult to solve unless some decisions are made about EventBus rules.

fire could take a copy of the HashSet and iterate over the clone, discarding the clone on completion. Unsubscription (including single fire subscriptions) could then be performed against the initial hashset. This poses a problem for nested event calls, as it could not be idempotent on the order things are invoked. In my case, it the undock & FragmentB event calls would race and depending on which fired first, would control if a nested invocation of the FragmentB event would fire once or twice.

However, this method would assure that the FragmentB event is called at least once.

Second could be to support additional listeners and possibly locks, that move unsubscription to a listener that occurs after fire is complete. Unsubscription would register an unsubscribe intent, rather than a modification to the hashset right away.

However, this is likely to prohibit any form of nesting and more likely to result in deadlocks.

The safest initial method is to provide a warning that EventBus may not be safe to nested calls, possibly event dropping a nested call if one is already in progress for that event and declaring that events being fired are not idempotent in order (which is certainly true of HashSet anyway), or due to external events (undock) idempotent to assurance that it will / will not be called when technically eligible.

andrey-vasilyev commented 5 years ago

Just FYI Edvin just merged a fix for the similar issue - https://github.com/edvin/tornadofx/issues/1045

ppslim commented 5 years ago

@andrey-vasilyev many thanks for the info.

Looking over the detail, that would certainly cleanup many avenues. I need to look into it more though, but there initially seems risk concurrency problems could occur still.

Some elements of it are confusing, but fire would remain unsafe in a re-entrant call.

There may also be an issue in which unsubscribeAll could well see the event subscription list have a lot of it.valid=false states that are never cleaned up if nothing ever fires the event it was subscribed to.

Right now, clearing all subscription, and any request to add new ones via UIComponent, will not tend to the list. The list could grow forever (thus also applying GC problems) until the event is fired.

Class EventBus should ideally atomically count entry and exit to fire and unsubscribeAll. If and only if either method is being exited with the counter at 0, should cleanupInvalidRegistrations be called.

ppslim commented 5 years ago

Oh, and I note EventBus.clear may be unwise as-is.

andrey-vasilyev commented 5 years ago

I've actually made some improvements in follow up PR https://github.com/edvin/tornadofx/pull/1048

For example now fireEventswould iterate over effectively immutable set. And, thanks to your insight, invalid registrations would be filtered out on subscribe

EventBus.clear is definitely strange - all scopes are affected by single app stop. Will post a comment to https://github.com/edvin/tornadofx/issues/1009