BestImageViewer / geeqie

claiming to be the best image viewer / photo collection browser
http://www.geeqie.org/
GNU General Public License v2.0
472 stars 78 forks source link

Bug: GPU acceleration #397

Closed caclark closed 4 years ago

caclark commented 8 years ago

Steps to reproduce:

Compile with --enable-clutter Start Geeqie On Preferences/Image select "Use GPU acceleration via Clutter library" Shut down Geeqie Start Geeqie

The image window is now black

tomaszg7 commented 8 years ago

I just tried it and I get different colors ;) Image window (up to the image size) is filled with a single color. That color varies from image to image and is somehow related to average color present.

I use Nvidia GPU with proprietary drivers 367.18. Here's error log:

renderer-clutter.c:175: 

renderer-clutter.c:774: rc_update_viewport  scale 0 0
renderer-clutter.c:299: scale 0 0
renderer-clutter.c:300: pos   0 0
renderer-clutter.c:522: rc_update_pixbuf
renderer-clutter.c:527:     0.337393 (+00000.337393) change time reset
renderer-clutter.c:534: pixbuf size 200 x 70 (32)
renderer-clutter.c:403:     0.337768 (+00000.000375) tex upload high prio
renderer-clutter.c:576: rc_update_zoom
renderer-clutter.c:299: scale 1 1
renderer-clutter.c:300: pos   0 0
renderer-clutter.c:576: rc_update_zoom
renderer-clutter.c:299: scale 1 1
renderer-clutter.c:300: pos   0 0
renderer-clutter.c:576: rc_update_zoom
renderer-clutter.c:299: scale 200 70
renderer-clutter.c:300: pos   84 213
renderer-clutter.c:774: rc_update_viewport  scale 200 70
renderer-clutter.c:299: scale 200 70
renderer-clutter.c:300: pos   84 213
Address already in use: /home/tomaszg/.config/geeqie/.command
renderer-clutter.c:774: rc_update_viewport  scale 200 70
renderer-clutter.c:299: scale 200 70
renderer-clutter.c:300: pos   520 467
renderer-clutter.c:427:     0.544612 (+00000.206844) upload start
renderer-clutter.c:445:     0.544775 (+00000.000163) upload end
renderer-clutter.c:218:     0.544798 (+00000.000023) clut start
renderer-clutter.c:244:     0.546191 (+00000.001393) clut upload start
renderer-clutter.c:283:     0.546374 (+00000.000183) clut end
caclark commented 6 years ago

This information is for anyone who wants to follow-up this problem:

In renderer-clutter, patch out the call to rc_set_shader(). This is used only for the background checker-board. The problem is the number of parameters for vec4.

The image is now displayed with false colors.

Change any BGR to RGB and BGRA to RGBA. The colors are now more correct, but the blue channel is missing.

In pixbuf_util.c, pixbuf_desaturate_rect(), change any *pp = grey; to *pp = 255; Run Geeqie and select Color Management/Toggle Grayscale.

The image is correct. Looks like the format of the data in the pixbuf isn't as expected.

(I don't think this problem is worth expending any effort on)

caclark commented 4 years ago

The attached .diff is a first step in fixing this. There is no alpha-channel and no post-processing, but images are displayed correctly. The dreadful image flicker of #602 is eliminated, and as the "draw" signal is not used there is a probability that #644 is fixed also (but I have no way to check this). gpu-accel-397-1.diff.gz

tomaszg7 commented 4 years ago

Nice! Works for me with AMD Fury GPU. A minor issue is a kind of "flicker" (sorry ;) ) - with old code, while switching pictures, they were pained one over another or at least it looked like this. Now it seems there is an intermittent black frame which makes it a bit annoying when quickly scrolling through the pictures.

New code seems faster, although not by much on my system. I'll keep the patch for a few days, see if nothing else breaks.

caclark commented 4 years ago

there is an intermittent black frame

The image display is not tiled, but is written in a row-by-row manner. You can easily see this effect when displaying a large image - e.g. 27000 x 6000. Probably what you see is is the first rows of one image being displayed on a black background before switching to the next image. The black frame is because at the moment, as Henry Ford said, you can have any background colour you like as long as it's black.

At some time when trying to find a solution to something or other, I had the display wait until an image was fully available before displaying it. When displaying large images this did not give a good "feel" because of the delay before anything visible happened. At the time I thought the best solution would be to cross-fade into the new image, but if that happens it will be far in the future.

tomaszg7 commented 4 years ago

Yeah, cross-fading seems like a way to go. I suspect that waiting for full image to render would make Geeqie seem unresponsive.

Since yesterday I suffered several Geeqie crashes with a suspicious dmesg entry:

traps: geeqie[21555] trap int3 ip:7f0ff3225705 sp:7ffedae4a1f0 error:0 in libglib-2.0.so.0.6200.6[7f0ff31ea000+84000]

They are quite rare. I didn't try yet running it in gdb to try to pinpoint it better and I can't be sure this patch is connected to this.

caclark commented 4 years ago

There is a commit in the repo. at geeqie.org which addresses this problem. It is just a quick-fix to provide something that works. It is slow on large files and even more so when post-processing is selected (greyscale and over-exposure).

Issues #602 and #644 might also be solved with this commit.

http://geeqie.org/cgi-bin/gitweb.cgi?p=geeqie.git;a=commit;h=7eda2f82ee66cec9988338e17da5d6fba42451ed

tomaszg7 commented 4 years ago

I didn't yet try to fully catch what's causing that crash, but I happened to see console output from one of these crashes:

(geeqie:18006): Gdk-ERROR **: 20:39:59.357: The program 'geeqie' received an X Window System error.
This probably reflects a bug in the program.
The error was 'GLXBadDrawable'.
(Details: serial 41353 error_code 158 request_code 151 (GLX) minor_code 29)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the GDK_SYNCHRONIZE environment
variable to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)

Not sure it is really related, but it started just after I applied the patch. I'll try disabling GPU processing and see if it goes away.

caclark commented 4 years ago

There is another commit at geeqie.org. This is functioning, but needs more work regarding deprecated functions and use of the GPU. However this is all I intend to do at the moment, so I will close this issue. http://geeqie.org/cgi-bin/gitweb.cgi?p=geeqie.git;a=commit;h=666f14a4f23b64a27a5fa2900a726359aebc1baa

tomaszg7 commented 4 years ago

It still crashes as hell, even with last patch. It seems to happen mostly when there are two windows active (possibly while switching between them). Naive guess would be this clutter library doesn't play nice in multithreaded apps and needs some locks. Here's backtrace:

(geeqie:7540): Gdk-ERROR **: 19:13:34.840: The program 'geeqie' received an X Window System error.
This probably reflects a bug in the program.
The error was 'GLXBadDrawable'.
  (Details: serial 126118 error_code 158 request_code 151 (GLX) minor_code 29)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the GDK_SYNCHRONIZE environment
   variable to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)

Thread 1 "geeqie" received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7324705 in ?? () from /usr/lib64/libglib-2.0.so.0
(gdb) bt
#0  0x00007ffff7324705 in  () at /usr/lib64/libglib-2.0.so.0
#1  0x00007ffff732711c in g_log_writer_default () at /usr/lib64/libglib-2.0.so.0
#2  0x00007ffff7325377 in g_log_structured_array () at /usr/lib64/libglib-2.0.so.0
#3  0x00007ffff7325df8 in g_log_structured_standard () at /usr/lib64/libglib-2.0.so.0
#4  0x00007ffff782fc9a in  () at /usr/lib64/libgdk-3.so.0
#5  0x00007ffff783d0c3 in  () at /usr/lib64/libgdk-3.so.0
#6  0x00007ffff622bebb in _XError () at /usr/lib64/libX11.so.6
#7  0x00007fffeee1262b in  () at /usr/lib64/libGLX_mesa.so.0
#8  0x00007fffeee15460 in  () at /usr/lib64/libGLX_mesa.so.0
#9  0x00007ffff6cb65e8 in  () at /usr/lib64/libcogl.so.20
#10 0x00007ffff6d2afcc in  () at /usr/lib64/libclutter-1.0.so.0
#11 0x00007ffff6d2e4bb in  () at /usr/lib64/libclutter-1.0.so.0
#12 0x00007ffff6d95580 in  () at /usr/lib64/libclutter-1.0.so.0
#13 0x00007ffff6d2de6c in  () at /usr/lib64/libclutter-1.0.so.0
#14 0x00007ffff7408fb4 in g_closure_invoke () at /usr/lib64/libgobject-2.0.so.0
#15 0x00007ffff741c477 in  () at /usr/lib64/libgobject-2.0.so.0
#16 0x00007ffff7425466 in g_signal_emit_valist () at /usr/lib64/libgobject-2.0.so.0
#17 0x00007ffff7425b07 in g_signal_emit () at /usr/lib64/libgobject-2.0.so.0
#18 0x00007ffff780dc22 in  () at /usr/lib64/libgdk-3.so.0
#19 0x00007ffff77f83f9 in  () at /usr/lib64/libgdk-3.so.0
#20 0x00007ffff731ef14 in  () at /usr/lib64/libglib-2.0.so.0
#21 0x00007ffff731e3ad in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
#22 0x00007ffff731e788 in  () at /usr/lib64/libglib-2.0.so.0
#23 0x00007ffff731ea63 in g_main_loop_run () at /usr/lib64/libglib-2.0.so.0
#24 0x00007ffff7ad8dbd in gtk_main () at /usr/lib64/libgtk-3.so.0
#25 0x00005555555c9a65 in main ()
caclark commented 4 years ago

On one computer with an AMD APU, I have occasionally seen this error - but thought it was simply because I have somehow messed up the system files. However I cannot replicate it at will.

On another computer with a separate Nvidia card, I have never seen this error.

tomaszg7 commented 4 years ago

Let me add that I use AMD and I have ROCM-3.5.0. Just tested with another machine with AMD APU and no ROCM (which shouldn't have mattered anyhow) and got exactly the same crash.

It seems quite easy for me to reproduce: open two windows. Start switching images with scroll in one, maybe hover onto another and start switching images there. I have "focus follows mouse" enabled which makes it easier to trigger but it is not necessary.

caclark commented 4 years ago

OK, I followed your instructions. The computer with the AMD APU will crash fairly quickly. But as the stack trace shows, this is not triggered within Geeqie so it will not be so easy to find.

A laptop with an Intel HD Graphics 4000 device also crashes.

However the system with a separate Nvidia card is rock-solid. I guess this is somewhere to start - I will have to start swapping things around to find where the problem is.

tomaszg7 commented 4 years ago

Nvidia doesn't use Mesa (unless you use Nouveau), that might be relevant.

I tried looking around and found things like that: https://stackoverflow.com/questions/3831767/c-clutter-1-0-calling-function-from-thread-causes-segfault but I don't know if it is relevant.

caclark commented 4 years ago

Hello Tomasz

I tried looking around and found things like that: https://stackoverflow.com/questions/3831767/c-clutter-1-0-calling-function-from-thread-causes-segfault but I don't know if it is relevant.

Thanks, but I do not think it is the problem. In the meantime additional Geeqie windows should be opened with --new-instance. I still don't see why it affects one machine and not another...

caclark commented 4 years ago

Hello Tomasz Well, actually the link did point to part of the problem. It seems that I had not bothered to RTFM. But the manual is so big these days that it would take more than a lifetime to read it. And some of the deprecated functions are deprecated to the point that they do not always works as expected.

It will take me a couple of days to fix this.

caclark commented 4 years ago

Hello Tomasz I had thought that the absence of clutter_threads_add_idle() was the problem, but that turned out to be a mistake.

I installed a Nvidia card in the computer with an AMD APU. The driver defaults to : "Noveau display driver from xserver-xorg-video-noveau" The crashes still occur.

I then installed nvidia-driver-440. There are no more crashes.

This does not look too good at all.

caclark commented 4 years ago

Hello Tomasz I would appreciate it if you would try this: export LIBGL_ALWAYS_INDIRECT=1 geeqie

On my laptop I also had to include in main.c: #include <X11/Xlib.h ... XInitThreads();

Thanks Colin....

tomaszg7 commented 4 years ago

Without changes to the main.c I got:

[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
geeqie: /var/tmp/portage/x11-libs/libX11-1.6.9/work/libX11-1.6.9/src/xcb_io.c:260: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted

After adding these lines I got

(geeqie:23754): Clutter-CRITICAL **: 14:20:58.789: Unable to initialize Clutter: Unable to initialize the Clutter backend: no available drivers found.
Can't initialize clutter-gtk.
caclark commented 4 years ago

Hello Tomasz Attached is a .diff you might like to try sometime. This does not solve the problem, it just avoids it. I have tried it on my laptop, Nvidia cards using the Nvidia driver and Xorg Nouveau driver, an AMD APU, and I bought a cheap Radeon R9 270X to check this. It works on all my systems. 397-2.diff.gz

tomaszg7 commented 4 years ago

With this patch I get the same error as before:

(geeqie:19403): Clutter-CRITICAL **: 22:10:44.939: Unable to initialize Clutter: Unable to initialize the Clutter backend: no available drivers found.
Can't initialize clutter-gtk.
caclark commented 4 years ago

Hello Tomasz Thanks for trying this. I see from an above note that you get the same problem on two machines. Which distribution (or anything else that might be significant) are you using on these computers?

tomaszg7 commented 4 years ago

I tried the recent patches only on one machine. It runs Gentoo, not sure which other bit of information would be relevant... There's nothing really exotic in my setup. Maybe some dependency is not installed by default.

I checked versions of packages which are direct dependences of clutter-gtk and here's some that might (or maybe not...) be related: cogl-1.22.6, gtk+-3.24.20, clutter-1.26.4, clutter-gtk-1.8.4.

I also looked into this https://developer.gnome.org/clutter/unstable/clutterbackends.html and I can mention that I have EGL, Wayland and MIR bits disabled. I'll see if enabling some of them would change anything but that's shooting in the dark from my end.

tomaszg7 commented 4 years ago

I recompiled Clutter, Clutter-GTK and COGL with EGL/GLES2 support and now Geeqie seems to work fine. No crashes, no errors so far.

tomaszg7 commented 4 years ago

Another small problem: GPU rendering seems to break Pan view - all I see is black window.

caclark commented 4 years ago

Another small problem: GPU rendering seems to break Pan view - all I see is black window.

Thanks - an advantage of this type of project is that there is no QA department nagging for documentation. A disadvantage is that there is no regression test set.

There is a commit at geeqie.org that provides a work-around. I suspect that rendered-tiles is embedded in the design of pan-view. I will look at a proper solution sometime .... maybe.