HeinrichApfelmus / threepenny-gui

GUI framework that uses the web browser as a display.
https://heinrichapfelmus.github.io/threepenny-gui/
Other
439 stars 77 forks source link

"on disconnect" event unreliable #133

Open blitzcode opened 8 years ago

blitzcode commented 8 years ago

If you simply spam the refresh button during loading of the page, or any other exception / error occurs, the disconnect event will never trigger. This means i.e. that programs which spawn a worker thread and kill it during the disconnect event (like the Chat example) will leak resources. It also means it's not possible for a program to reliably detect if no clients are connected and cease certain time consuming operations.

blitzcode commented 8 years ago

Looking at this some more, I think in addition to the O(1) space leak for an orphaned message processing thread there's also a potential O(n) one if a TChan based architecture is used (like in the Chat example). The messages can't be garbage collected if one or more orphaned threads can't read them anymore, leaving them to pile up indefinitely.

HeinrichApfelmus commented 7 years ago

I have addressed the problems with "on disconnect" in the latest commit referencing this issue. It should work much more reliably now. Does this help, or is there still an edge case that is not covered?

blitzcode commented 7 years ago

I was unfortunately still able to disconnect without my on disconnect code running. I also saw this debug output:

Foreign.JavaScript: Browser window disconnected.
thread blocked indefinitely in an MVar operation
hue-dashboard: ConnectionClosed

I'm not sure which thread or MVar operation is the culprit, but the only place in my code where I use MVars is for tracing, and I don't see how there could be a block there.

For testing I created a TVar which I increment on setup and decrement on disconnect. The only other operation in the disconnect handler is the cancellation of the worker thread, like your chat example. I managed to get a count of 2 with no clients connected anymore.

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync to make absolutely sure a background worker thread is terminated or that a 'connected user' counter is always correctly maintained. But you can't use bracket as the setup function has to return. For instance, I can increment the user count and immediately register the decrementing 'on disconnect' handler (or swap the order of these actions), but there always could be an async exception interrupting me. So I think there's always the chance of leaving dangling resources or wrong counts. Not sure what the correct solution is here.

blitzcode commented 7 years ago

btw, those three quoted messages happened when I brought up the threepenny application on my phone, switched away from the browser and waited for the connection to be closed. Simply spamming refresh in the browser seemed to work fine, probably a more graceful disconnect.

HeinrichApfelmus commented 7 years ago

I was unfortunately still able to disconnect without my on disconnect code running. I also saw this debug output:

    Foreign.JavaScript: Browser window disconnected.
    thread blocked indefinitely in an MVar operation
    hue-dashboard: ConnectionClosed

Huh, that is quite weird, because the code block that prints the "Browser window disconnected" message (line 144) will also execute the disconnect event handler (line 148-149). The only thing that could happen is that the intermediate call to commClose raises an exception. Hm, it is implemented in terms of killThread, which is interruptible, so that may be related to the "thread blocked indefinitely in an MVar operation" message...

Just to be sure: Does the problem disappear if you replace line 146 (the line containing commClose) with the following code?

let all :: E.SomeException -> Maybe ()
    all _ = Just ()
E.tryJust all $ commClose comm

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync. [...] But you can't use bracket as the setup function has to return.

That is true, but I think I can do two things:

In other words, I could turn the startGUI function into a variant of bracket. It may be a bit inconvenient, but probably better than less reliable?

blitzcode commented 7 years ago

It's a bit puzzling, I agree. I checked my code again, the only place where I directly use MVars is through withMVar, which should be safe and never leave an empty one that would then end up blocking a thread.

I tried with your catching fix and could not reproduce the issue. To be sure, I then tried again without and could not get the timing right to reproduce it either. The timing must be quite specific.

The problem I see with specifying the cleanup function up-front is per-connection resources. For instance, how would the chat example terminate its worker thread that is spawned inside the setup function?

HeinrichApfelmus commented 7 years ago

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync.

Actually, this is not necessary if we implement the disconnect event as an ordinary event as opposed to an asynchronous exception. Commit 121eb17e8386d0da58e96186d7dd1896d8a17e7e implements this. On the other hand, this makes it impossible to stop a long running computation by closing the browser window... Thoughts?

blitzcode commented 7 years ago

It's probably reasonable to expect developer to not do long running computations or lengthy I/O in response to a UI event. It would be nice in general if you could completely kill a wedged session by refreshing the browser, though.

Your change would certainly stop the setup code from being interrupted by a disconnect. A remaining issue would still be how to structure the connection setup code. The sequence of steps might be something along the lines of this:

Even if threepenny would never throw an async exception, any of the above might cause an exception to be thrown. So I wonder if it's possible to make this completely bulletproof. Normally, you'd use the bracket pattern for something like spawning a thread or incrementing/decrementing the connection count, but how would that look like if the cleanup has to happen inside an event handler? I think one could use something like bracketOnError with each step to ensure cleanup in case an exception is raised before the final handler is registered.

blitzcode commented 7 years ago

I suspect the latest commit for this might have broken something (everything? ;-)), see #172

HeinrichApfelmus commented 7 years ago

I intend that the current semantics (synchronous event, commit 121eb17e8386d0da58e96186d7dd1896d8a17e7e) go into the next Hackage release (0.8). Does it work for you?

blitzcode commented 7 years ago

I've been running with the current code for a while and haven't noticed any actual issues in 'real world' usage. I'm still somewhat concerned that this might cause resource leaks for long-running applications just in case there actually is some hanging computation. Also I don't think it's possible to make cleanup completely bulletproof, see my comment from Dec 14th.

HeinrichApfelmus commented 7 years ago

Ok, thanks for the heads-up! I think the current version qualifies as "good enough", so I'll release it for now, but keep the issue open. :smile:

blitzcode commented 7 years ago

I'd say so, certainly doesn't seem to be a regression. My type of application is a bit more impacted by potential resource leaks as it'll have many client connections and often runs for many months without restarts, but I hope that I actually don't have any hanging event handlers ;-)