Open willrogers opened 3 years ago
The same happens in phoebus, https://github.com/ControlSystemStudio/phoebus/issues/705
With phoebus, I've occasionally seen it on startup. Googling around led me to believe that it can be caused by calling UI code off the UI thread, but unclear where that would be taking place.
We had the same error in ITER. Every time we'd open the databrowser CS-Studio would crash.
Debugging I found that these lines aren't executed in a UI thread. We solved it by wrapping them in a Display.syncExec
.
The whole point of the data browser is that the image is created in a background thread, so moving updateImageBuffer()
to the UI thread means the UI is slowing down as you plot more and more data.
In the phoebus/JavaFX version, there were occasional issues with obtaining an in-memory image buffer, so the code to allocate an in-memory image buffer moved to the UI thread, but the code which then keeps re-using that buffer to draw into the buffer runs in background threads.
@GuillemCastro lucky you saw this ticket, thanks. That gives us a lot more information.
We have tried putting updateBufferImage
in syncExec
and it certainly appears to fix the problems we are seeing. I can't yet assess the performance impact.
I can't find any way to make a smaller change that fixes the problems. The code as it is will fail in a number of ways and even if I change updateBufferImage
just to draw a small red square, CS-Studio will often deadlock if I play around for a while.
Can you tell where it's locking/failing?
Is it the initial new Image
call?
In the JavaFX version, the comparable call to get an in-memory image was problematic and had to happen on the UI thread, so some https://github.com/ControlSystemStudio/phoebus/blob/master/core/ui/src/main/java/org/phoebus/ui/javafx/BufferUtil.java is used to handle that, and the rest of the actual drawing code can then run in the background.
I made a minimal copy of your Phoebus code and then caught a deadlock in the debugger, you can see quite a lot in this image. It appears to have locked at the display.syncExec()
, oddly enough, before it tries to create the image. I didn't set a breakpoint, I waited until it locked up and then suspended the thread to see the stack trace.
I notice that there are two "RTPlotUpdateThrottle" threads, I don't know whether that is correct.
For what it's worth, in the current phoebus incarnation the "RTPlotUpdateThrottle" is different. There's just one global "UpdateThrottle" thread (since there's also just one UI, after all), and an "RTPlot" pool sized to match the CPU core count. The images are also created in a double-buffer scheme, i.e. 2 per plot which are then re-used. So it's all quite different and doesn't help at all to explain what you're seeing in the SWT code.
I can't tell what's blocking here, i.e. who's holding the lock, but I think the approach could be changed. 'sync' exec blocks, so with that you wouldn't even need a CompletableFuture. I would use asyncExec, and then actually utilize the CompletableFuture to await the result:
// On background thread
// TODO Add some check to assert that we _are_ on a background thread
CompletableFuture<BufferUtil> result = new CompletableFuture<>();
// On display thread, at some time in the near future, create the buffer
Display.asyncExec( () -> result.complete(new BufferUtil(... );
// Back off the display thread, await the result
BufferUtil buffer = result.get(500, TimeUnit.MILLISECONDS);
The way you use syncExec, you might as well put the result in an AtomicReference:
AtomicReference <BufferUtil> result = new AtomicReference <>();
// On display thread, at some time in the near future, create the buffer
Display.syncExec( () -> result.set(new BufferUtil(... );
// As we return from the blocking syncExec, result has been set.
// ... or we block for a while, because blocking is what syncExec is meant to do
BufferUtil buffer = result.get();
Aha yes AtomicReference
will be what I want.
Actually, I'd try asyncExec to decouple the threads.
The 'syncExec' can block forever (or until the next power outage). With asyncExec, the CompletableFuture.get(..) will time out after 500ms, so you skip one update, but you're not stuck.
I didn't actually understand that asyncExec()
was also set to run on the UI thread, I was thinking that it meant run a task in the background. I guess that's a fairly fundamental misunderstanding. Anyway, now I have moved it to an asyncExec
as you suggest and it tells me a bit more - once the UI locks up it is indeed timing out after 500ms every time, presumably because the UI thread is blocked. I assume that Thread [main]
is the UI thread.
It seems that any use of the GC
in a background thread is causing the problems.
Bummer. It's certainly possible that the latest version of SWT limits all access to the UI thread, i.e. can no longer use SWT to prepare images in the background. So you'll have to run updateImageBuffer() in the UI thread. Or use databrowser3, or phoebus, which actually draw in the background with AWT.
I asked on an Eclipse mailing list but it hasn't got through moderation, and I asked on the Eclipse forums but I haven't had an answer.
I think we are going to have to move more processing to the UI thread, and then revisit this either if we see performance problems or if I find out any more information about SWT.
Not sure if there is an officially blessed SWT example for preparing images in background threads, then loading them on the SWT thread.
There are various examples, like https://www.eclipse.org/forums/index.php/t/168168/ first running into Invalid thread access
, then followed by an example that basically does offScreenImage = new Image(display, 400, 300); ... draw ...
in a background thread, ending in display.asyncExec(... show offScreenImage ..)
, which is what we've been doing.
But there is no such example in the official https://www.eclipse.org/swt/snippets/
I have opened an Eclipse bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=568859. Here is a relevant reply from the mailing list.
Hi Will,
In general most of SWT code expects to be executed on UI (main) thread only, but you seem to work on non UI thread? If you can provide a standalone reproducer example, please create bug for that, we should discuss on bug. I guess the new behavior is related to removing locks around native GTK calls. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=546743#c17 https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/155907 https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/155989 .
Kind regards, Andrey Loskutov
Try Snippet141 from https://www.eclipse.org/forums/index.php/t/168168/ :
new Thread("Animation")
{
public void run()
{
/* Create an off-screen image to draw on, and fill it with the shell background. */
Image offScreenImage = new Image(display, 400, 300);
GC offScreenImageGC = new GC(offScreenImage);
....
combined with the fix suggested further down in that thread to then use getDisplay().asyncExec(..
to place that image created in a background thread in the UI.
It seems that there has been a bug introduced that causes problems when creating an Image
on a background thread. I expect that I will need to keep the workaround until that bug is fixed.
Having built CS-Studio against the latest Eclipse version, we have a serious problem that I have not seen before.
Simply opening the databrowser causes CS-Studio to disappear immediately, leaving only the following error:
@kasemir I realise that you won't have much interest in fixing this, but do you have any ideas as to where we might look? I expect the problem is somewhere in
org.csstudio.swt.rtplot
.