Beep6581 / RawTherapee

A powerful cross-platform raw photo processing program
https://rawtherapee.com
GNU General Public License v3.0
2.69k stars 308 forks source link

Crashing during processing of queue #1772

Closed Beep6581 closed 8 years ago

Beep6581 commented 8 years ago

Originally reported on Google Code with ID 1788

When processing a queue containing many (70+) raw files, Raw Therapee crashes to desktop
after an apparently random number of images have been processed. In generally, it seems
that it is more likely to crash early on if you work in RT for a little while first.
If you simply resume after a crash, go to the queue, and resume processing, it seems
to last a bit longer before crashing.

Note that processing this many files in the queue is a necessary part of my workflow.
I do microphotography and focus stacking - dealing with hundreds of images is not unusual.
Unfortunately, this bug makes it rather bothersome to do, having to restart RT and
resume processing the queue 3-4 times before the images are fully processed.

There are no error messages reported. This bug has existed for as long as I can remember,
back all the way to some version of RT 3. I am currently using the newest version of
RT, built today from the source - but I normally use the officially released binaries
and the issue is equally bothersome in all cases.

Issue 1383 seems to be exactly the problem, except it's been marked as fixed. I continue
to experience the same symptoms.

Since I have the source and software necessary to build, but am not entirely sure what
I am doing (not much of a desktop app developer, I'm afraid), someone will have to
walk me through building a debug version so that I can get a stack trace or error codes
of some sort.

I am processing raw files from a Canon 7D camera (.cr2 extension). The issue also existed
with a Rebel T2i (same raw file format).

Here is a link to a sample raw file and accompanying .pp3 file: http://filebin.net/jp9lm3j8gs

To simulate my workflow, you can make 200 duplicates of these files with unique names,
add them to the queue, and attempt to process them.

Branch: default
Version: 4.0.10.4
Changeset: 50973592b058
Compiler: gcc 4.5.2
Processor: undefined
System: Windows
Bit depth: 64 bits
Gtkmm: V2.22.0
Build type: Release
Build flags:  -march=native -fopenmp -O3 -DNDEBUG
Link flags:   -march=native
OpenMP support: ON
MMAP support: ON

Win7-64bit SP1
8 gigs ram
Intel Core i7-2630QM

The issue also existed when I was running WinXP-32bit on a lower end machine.

Operating system (e.g. winXP-32bit, Ubuntu-11.10-64bit):

Reported by rylee.isitt on 2013-03-18 04:16:07

Beep6581 commented 8 years ago
Rylee,
here's a backtrace from gdb that I got using your files as above after RT crashed:

===============================================================
[Thread 0x7fff77fff700 (LWP 32527) exited]
[Thread 0x7fff777fe700 (LWP 32531) exited]
[Thread 0x7fff99f7b700 (LWP 32529) exited]

(rawtherapee:32305): glibmm-ERROR **: 
unhandled exception (type std::exception) in signal handler:
what: invalid value (typically too big) for the size of the input (surface, pattern,
etc.)

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000330c4518b7 in g_logv () from /usr/lib64/libglib-2.0.so.0
(gdb) bt
#0  0x000000330c4518b7 in g_logv () from /usr/lib64/libglib-2.0.so.0
#1  0x000000330c451a72 in g_log () from /usr/lib64/libglib-2.0.so.0
#2  0x000000331b84baf7 in Glib::exception_handlers_invoke() () from /usr/lib64/libglibmm-2.4.so.1
#3  0x00007ffff7669896 in Gtk::Widget_Class::expose_event_callback(_GtkWidget*, _GdkEventExpose*)
()
   from /usr/lib64/libgtkmm-2.4.so.1
#4  0x00007ffff7adf778 in ?? () from /usr/lib64/libgtk-x11-2.0.so.0
#5  0x0000003310411482 in g_closure_invoke () from /usr/lib64/libgobject-2.0.so.0
#6  0x0000003310422a99 in ?? () from /usr/lib64/libgobject-2.0.so.0
#7  0x000000331042a45e in g_signal_emit_valist () from /usr/lib64/libgobject-2.0.so.0
#8  0x000000331042a912 in g_signal_emit () from /usr/lib64/libgobject-2.0.so.0
#9  0x00007ffff7bf5a11 in ?? () from /usr/lib64/libgtk-x11-2.0.so.0
#10 0x00007ffff7addf60 in gtk_main_do_event () from /usr/lib64/libgtk-x11-2.0.so.0
#11 0x00000033198459ec in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#12 0x000000331984599b in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#13 0x0000003319840a23 in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#14 0x0000003319842bc1 in gdk_window_process_all_updates () from /usr/lib64/libgdk-x11-2.0.so.0
#15 0x0000003319842c29 in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#16 0x00000033198208e6 in ?? () from /usr/lib64/libgdk-x11-2.0.so.0
#17 0x000000330c44a6f3 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#18 0x000000330c44aa40 in ?? () from /usr/lib64/libglib-2.0.so.0
#19 0x000000330c44ae3a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#20 0x00007ffff7adccc7 in gtk_main () from /usr/lib64/libgtk-x11-2.0.so.0
#21 0x00007ffff7600cd6 in Gtk::Main::run(Gtk::Window&) () from /usr/lib64/libgtkmm-2.4.so.1
#22 0x00000000004e723e in main (argc=1, argv=0x7fffffffdf48) at /home/johan/rawtherapee/rtgui/main.cc:176
(gdb)
===============================================================

Perhaps it says anything to the devs... :-)

I'm using:
Branch: default
Version: 4.0.10.22
Changeset: 0ea5942ae023
Compiler: gcc 4.6.3
Processor: undefined
System: Linux
Bit depth: 64 bits
Gtkmm: V2.24.2
Build type: Relwithdebinfo
Build flags:  -march=native -fopenmp -O2 -g
Link flags:   -march=native
OpenMP support: ON
MMAP support: ON

Regards,
Johan

Reported by johan@birkagatan.com on 2013-03-18 09:19:56

Beep6581 commented 8 years ago
Johan,

That's great, and I'm glad this can be reproduced in Linux (I guess that sounds bad,
but if the bug is platform independent, it's easier to reproduce and possibly fix).

But just to be sure, I'd like to see if I get the same error.

If I set the build target to Relwithdebinfo, I should be able to get a backtrace? I'll
try that after I get some sleep :)

Reported by rylee.isitt on 2013-03-18 09:37:37

Beep6581 commented 8 years ago
@rylee: Maybe there's a workaround for your problem. I did some tests with your pic
(and I think, I'm not the only one), as you described (made 100 copies of your pics
and put them into the queue) => crashed after random number of pictures being processed.
Then I tried the following:

Start RT, add all the pics to the queue. In RT Filebrowser switch to an empty directory.
Close RT. Start RT and process the queue. Worked at my system processing 100 copies
of your file.

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-18 22:15:06

Beep6581 commented 8 years ago
I run RawTherapee on Ubuntu 12.10 and can confirm similar behaviour.. after processing
a random number of files piled up in queue it freezes and crashes..

Reported by maillucas on 2013-03-19 09:39:49

Beep6581 commented 8 years ago
I saw something very similar with win7 see issue 1746

Reported by scribble1@charter.net on 2013-03-19 14:46:54

Beep6581 commented 8 years ago
Don't see the connection between this one and Issue 1746...

Reported by heckflosse@i-weyrich.de on 2013-03-19 14:57:20

Beep6581 commented 8 years ago
sorry typo issue 1748

Reported by scribble1@charter.net on 2013-03-19 15:54:53

Beep6581 commented 8 years ago
Another observation, I made: The queue is stable, when the queue is not in foreground.
Just processed 200 pictures with the queue in background and filebrowser in foreground.

Reported by heckflosse@i-weyrich.de on 2013-03-19 17:16:06

Beep6581 commented 8 years ago
I'm pretty sure it crashes on me whether the queue or file browser is in the foreground,
but I'm not 100% sure. For the last batch run, I left the queue in the foreground.
I'll have to test this out, as well as your earlier suggestion to browse to an empty
folder with a fresh start. But I'm quite busy during the week so it might take me to
the weekend before I can get around to it...

Reported by rylee.isitt on 2013-03-19 17:58:09

Beep6581 commented 8 years ago
PS, thanks a ton for looking into this. It's very appreciated, and doubly so if the
queue becomes stable in the future :)

Reported by rylee.isitt on 2013-03-19 18:03:33

Beep6581 commented 8 years ago
Another test to do: batch processing lots of files, but prior to click on the Start
button, browse into another non empty directory. If it doesn't crash, the bug may reside
in the interaction between the batch pannel and the file browser, when it have to update
the thumbnail container with the "Saved" icon (the floppy icon).

Reported by natureh.510 on 2013-03-19 18:04:07

Beep6581 commented 8 years ago
Tested that yesterday. Crashed. But I think you're right with your guess. Had the same
thought, but didn't find the interaction in sourcecode.

Reported by heckflosse@i-weyrich.de on 2013-03-19 18:25:50

Beep6581 commented 8 years ago
Then the only components common to the file browser and the batch panel are those related
to the Thumbnails.

Reported by natureh.510 on 2013-03-19 18:45:35

Beep6581 commented 8 years ago
Misunderstanding: I browsed to another empty directory before starting the queue =>
crashed. Will try now with another non empty directory.

Reported by heckflosse@i-weyrich.de on 2013-03-19 18:47:09

Beep6581 commented 8 years ago
Tested with non empty directory. Crashes.

Reported by heckflosse@i-weyrich.de on 2013-03-19 18:59:02

Beep6581 commented 8 years ago
The important thing is what was the active tab when processing+crash ?

Reported by natureh.510 on 2013-03-19 22:20:16

Beep6581 commented 8 years ago
Active tab when crash occurred was Queue-tab. Every time. Exception is, that no crash
occurs, when doing at described in #3, though active tab has also been Queue-tab. At
my system, no crash, when queue-tab is not active! Crashes often, when queue-tab is
active.

Reported by heckflosse@i-weyrich.de on 2013-03-19 22:37:02

Beep6581 commented 8 years ago
Rylee, thanks a lot for your informations. Maybe the two mentioned workarounds will
help you to process the images you need for stacking until we have a solution. Focus
stacking is a very interesting thing, tried it some time ago. Can you tell me, which
software you use for focus-stacking?

Sorry for hi-jack

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-19 23:13:44

Beep6581 commented 8 years ago
Ingo,

I might try those methods out tonight. If I do, I'll let you know the results!

I use Zerene Stacker. I've tried pretty much everything... and Zerene has given me
the best results. Plus, the founder is active in the macro/microphotography scene and
generally a very knowledgeable person.

Reported by rylee.isitt on 2013-03-19 23:59:20

Beep6581 commented 8 years ago
Ingo, in both of those two methods you'd like me to try, should I remain in the queue
tab while it's processing, or switch back to the file browser? I'm processing a batch
right now. We'll see how it goes!

Reported by rylee.isitt on 2013-03-20 05:44:11

Beep6581 commented 8 years ago
Okay, so I got another crash, here's what I did:

- added all files to queue (150 of them)
- browsed to an empty folder
- closed RT, restarted
- went to the queue tab and started the conversion
- left it on the queue tab

It crashed after processing image 118 of 150, which seems promising but it's managed
to get that far before.

I resumed RT, with 32 images left in the queue. The file browser was still in an empty
directory, so I went to the queue tab and resumed. It crashed again after processing
16 more photos.

On the third run, RT finished without a crash.

It's worth nothing, perhaps, that I had RT minimized during this process, and using
Firefox at the time. When I have less multitasking to do I'll run another test where
RT is running in the foreground.

Reported by rylee.isitt on 2013-03-20 06:16:44

Beep6581 commented 8 years ago
Ok. Maybe I tried with too few (100) pics when making the test pointing the file browser
to an empty directory.
But, right now, a queue with 500 pictures finished without problems with file browser
in foreground and queue in background.
So please try this way: Put your pics into the queue, start the queue, switch to the
file browser and wait...

Reported by heckflosse@i-weyrich.de on 2013-03-20 11:50:38

Beep6581 commented 8 years ago
Detected, that the queue runs stable, when disabling line 478 in thumbbrowserbase.cc

    queue_draw ();

queue_draw() has nothing to do with the batchqueue, though it's name could lead to
this assumption. It simply redraws. As disabling this line would be no solution, I
did some investigations and found a solution, which works at least at my system. Queue
is also stable now when in foreground (tried with 300 pics). I would be glad for a
review of my patch, as there maybe better solutions than this quick hack. Maybe, we
have to add this also to other occurrences of queue_draw(). Didn't check for side effects...

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-20 16:17:20


Beep6581 commented 8 years ago
In issue 1538 I reported that the GUI gets borked after some time of use. This applies
to the queue as well, and I could sometimes trigger it (or increase the likelihood
of it happening) by deleting random images from the queue (e.g. add 100 photos to the
queue and remove random ones) while RT was busy processing them, though it was far
from reproducible. In fact the closest pattern I could find was the *likely* possibility
that this is somehow related to (contiguous?) free RAM. I noticed that it happened
more often when I had my web browser running, even though I always had lots of free
RAM (8GB in my system). So despite still having free RAM, the running web browser seems
to have made the GUI failure more likely.
When triggered via removal of images from the queue, usually only the GUI would hang,
RT would continue processing, though on a few occasions it stopped processing too,
but without crashing - it would just hang.
When triggered via the image editor, this usually didn't happen on the first raw I'd
edit but on the nth one.
I haven't processed large sets of photos in RT in a while, other than testing patches
and working on individual photos, so I haven't encountered that problem recently. I'm
just wondering whether these problems might be related, and putting this info out there
for you to consider while hunting for the cause.

Reported by entertheyoni on 2013-03-20 19:21:17

Beep6581 commented 8 years ago
Thanks DrSlony. The problem is, that concerning free RAM I can't really test, because
my machine has 32 GB. But I'll test with a large queue and removing and adding pics
from and to the queue while the queue is being processed.

Reported by heckflosse@i-weyrich.de on 2013-03-20 19:52:37

Beep6581 commented 8 years ago
DrSlony, tried a bit with a queue of about 300 pics being processed, while I added and
removed a lot of pictures. No crashes. Sometimes it looks like it would freeze when
adding e.g. 100 pics, but after waiting some time, the 100 pics have been added.
That's because (I think) we have an O(n²) problem, when adding pics to the queue, but
that's a different Issue.
Patch was applied during the tests.

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-20 20:51:52

Beep6581 commented 8 years ago
Concerning memory usage: Here's what the task manager shows during the batch processing.
As you can see, no heavy usage of RAM.

Reported by heckflosse@i-weyrich.de on 2013-03-20 21:19:04


Beep6581 commented 8 years ago
When working with the editor during processing of the queue, I sometimes got a crash,
even with my last patch from #26. So here's the next patch, which increases stability
in this situation.

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-20 22:41:13


Beep6581 commented 8 years ago
I added 500 photos to the queue, TIFF 16-bit saving, started it, went to shave. The
Queue tab was open and RT maximized. Came back, about 250 were done. Scrolled down
to the end of the photos in the Queue tab, and as soon as I got to the end the GUI
froze. No crash, nothing in the console, RT is still churning away, but the GUI is
completely frozen.
http://i.imgur.com/eXRWEEt.png

One thing that always happens when the GUI freezes in the Queue is that one of the
thumbnails disappears. I restarted RT, started the queue again, clicked around and
it wouldn't hang, so I let it run for a few minutes, then came back and clicked around
randomly again, and it hung. One thumbnail disappeared as expected.
http://i.imgur.com/2SQQcq2.png

Branch: default
Version: 4.0.10.28
Changeset: 6c1b8ce38e17
Compiler: gcc 4.6.3
Processor: undefined
System: Linux
Bit depth: 64 bits
Gtkmm: V2.24.2
Build type: release
Build flags:  -march=native -fopenmp -O3 -DNDEBUG  -mstackrealign
Link flags:   -march=native
OpenMP support: ON
MMAP support: ON

I will try the patches in this thread tomorrow and see if I can reproduce.

Reported by entertheyoni on 2013-03-20 23:41:35

Beep6581 commented 8 years ago
Just for information: I did all my test saving as JPG, not TIFF.

Reported by heckflosse@i-weyrich.de on 2013-03-20 23:55:53

Beep6581 commented 8 years ago
DrSlony, did you try the patches? Would be interested, whether it's more stable now
not only at my and Gaaned's system.

Ingo

Reported by heckflosse@i-weyrich.de on 2013-03-22 21:57:45

Beep6581 commented 8 years ago
Will commit the patch on March, 27, if nobody complains.

Reported by heckflosse@i-weyrich.de on 2013-03-24 00:25:23

Beep6581 commented 8 years ago
Hi!

Applied issue1788_2.patch and tried a release and debug build, both hang on startup.
No window appears. GDB shows 5 new threads created, then 4 exit, and after about 10
seconds the 5th exits. Nothing new appears, I have to SIGKILL.

[New Thread 0x7fffddefe700 (LWP 19534)]
[New Thread 0x7fffdd6fd700 (LWP 19535)]
[New Thread 0x7fffdcefc700 (LWP 19536)]
[New Thread 0x7fffcffff700 (LWP 19537)]
[New Thread 0x7fffcf7fe700 (LWP 19538)]
[Thread 0x7fffddefe700 (LWP 19534) exited]
[Thread 0x7fffcf7fe700 (LWP 19538) exited]
[Thread 0x7fffdcefc700 (LWP 19536) exited]
[Thread 0x7fffdd6fd700 (LWP 19535) exited]

[Thread 0x7fffcffff700 (LWP 19537) exited]

Reported by entertheyoni on 2013-03-25 16:05:43

Beep6581 commented 8 years ago
I also tried both PROC_TARGET_1 (-mtune=generic) and PROC_TARGET_2 (-march=native),
both hang.

Branch: default
Version: 4.0.10.33
Changeset: 8169474155c2
Compiler: gcc 4.6.3
Processor: generic x86
System: Linux
Bit depth: 64 bits
Gtkmm: V2.24.2
Build type: release
Build flags:  -mtune=generic -fopenmp -O3 -DNDEBUG  -mstackrealign
Link flags:   -mtune=generic
OpenMP support: ON
MMAP support: ON

Reported by entertheyoni on 2013-03-25 16:19:15

Beep6581 commented 8 years ago
OK. Does issue1788.patch work on Linux?

Reported by heckflosse@i-weyrich.de on 2013-03-25 17:27:27

Beep6581 commented 8 years ago
Yes, starts up fine. Will test for stability improvement with it now.

Reported by entertheyoni on 2013-03-25 18:05:26

Beep6581 commented 8 years ago
Ah no, RT starts up in an empty dir, but when I click on one with images it freezes.

Reported by entertheyoni on 2013-03-25 18:07:01

Beep6581 commented 8 years ago
Ok, what do now? As stability improves on Windows by using this patches, at least at
my an gaaneds systems, I can #ifdef the changes...

Reported by heckflosse@i-weyrich.de on 2013-03-25 18:09:48

Beep6581 commented 8 years ago
Don't see why not :]

Reported by entertheyoni on 2013-03-25 22:26:09

Beep6581 commented 8 years ago
Ok. Will do that. Would be nice to get some additional feedback of other windows users
about this change.

Reported by heckflosse@i-weyrich.de on 2013-03-25 22:35:02

Beep6581 commented 8 years ago
I can likely test this out pretty soon. Stay tuned.

Reported by rylee.isitt on 2013-03-26 00:09:21

Beep6581 commented 8 years ago
Ok, but I won't be able to make any tests, changes, commits or whatever from March,
30th until April 6th, because I'm in Provence for a week (no Internet, no mail, just
my wife, the dog and me (and my cam) :)

Reported by heckflosse@i-weyrich.de on 2013-03-26 00:22:20

Beep6581 commented 8 years ago
Perfect time to use the cam:)
Ingo, how would you like me to test this patch?

Reported by michaelezra000 on 2013-03-26 00:28:15

Beep6581 commented 8 years ago
Put 500 pics into the queue, start the queue, let the queue in foreground :-)
Thanks a lot Michael!

Reported by heckflosse@i-weyrich.de on 2013-03-26 00:43:02

Beep6581 commented 8 years ago
I'd be willing to test this if some would send me the latest RT with the patch.N

Reported by scribble1@charter.net on 2013-03-26 01:00:57

Beep6581 commented 8 years ago
Ingo, to speed things up - is it worth to try with a low res jpg file as input, or use
a 12Mp raw? Neutral profile?

Reported by michaelezra000 on 2013-03-26 01:07:53

Beep6581 commented 8 years ago
@ Michael: Didn't try with jpg yet. Don't know.

@ Scribble: Can make a Win64 version for you. OK?

Reported by heckflosse@i-weyrich.de on 2013-03-26 09:44:45

Beep6581 commented 8 years ago
Hi Ingo, I tested with Canon S90 CR2 files all consecutively numbered, using release
mode compiled with -O3

Without the patch I was able to add to queue 480 thumbs, then add 5 more 2 times 500
total. Started queue (output=jpg). Queue was in foreground. I was at the computer,
moving mouse, not doing much. Crash after 223 files.

With the patch: I deleted 223 converted jpgs, Started RT with the patch, added 244
items to queue, then a few more and a few more. At the last add I tried to switch to
Queue tab as soon as counter reached expected count of 502. Crashed with a windows
dialog that RT crashed.

Restarted RT with the patch. It took a very long time to start. Queue had 502 images.
Started Queue in the foreground. Crash after 102 jpgs. I was not at the computer, when
I came screen was on standby (powered down due to my timeout preferences, which is
normal). RT window was gone and no windows dialog (i wonder may be it was there but
timed out?) 102 jpgs converted. 1-82 are normal. #83 was half developed.

I will try to continue in the evening when I get back. Please let me know if you would
prefer me to execute any specific step in the test.

Reported by michaelezra000 on 2013-03-26 11:56:51

Beep6581 commented 8 years ago
Wonder, why it improves stability at my system...

Reported by heckflosse@i-weyrich.de on 2013-03-26 12:33:00

Beep6581 commented 8 years ago
Installed your build by copying visual bakery's rt 10.1 to a new folder and 
pasting in your build.
The first time I ran it I got an error message that it could not build my 
profile. The queue froze when I tried to convert 2 jpg with a wb correction. 
After closing RT with the task manager and restarting  there wasn't an error 
message and rt worked normally.  I did another batch of jpg wb conversions 
and got this in the dos box. The second message has been around since the 
dark ages. Don't know what the first  means.

AVG: 238.197 224.588 195.685

(rawtherapee.exe:5672): Gtk-WARNING **: Could not find the ico
'hicolor' theme
was not found either, perhaps you need to install it.
You can get a copy from:
        http://icon-theme.freedesktop.org/releases

Loaded 512 high quality D7000 jpgs into the queue in a couple steps without 
problems.  Converted to 80% jpgs with queue in the foreground. Took about 8 
secs per conversion. When I was down to 50 conversions I started to delete 
and move thumbs around in the queue without any problems.  Bottom line 498 
successful conversions.

My first 10 jpgs averaged 4.59 MB per file and dropped to .939 MB per file 
in the conversion. My system is a Dell win7  sp1 with an a 4 cpu i5 and 8 Gb 
memory.  I was running very clean since Microsoft installed one of its 
security updates this morning. Max memory usage was 3.16 GB during the 
conversions

The visual bakery  40.1 build would not wb balance jpgs and I started issue 
1804.  Build 10.32 works fine.  Was this a known issue that was fixed  by 
build 32? Or is there something wrong with the visual bakery build?

Anything else I should do before you head south for a vacation?

Reported by scribble1@charter.net on 2013-03-26 15:14:07