Beep6581 / RawTherapee

A powerful cross-platform raw photo processing program
https://rawtherapee.com
GNU General Public License v3.0
2.81k stars 316 forks source link

segfault on RAF files from Fuji S5 pro #3741

Closed agriggio closed 7 years ago

agriggio commented 7 years ago

The RAF file can be taken from here: https://discuss.pixls.us/t/rawtherapee-for-raf-files-of-fuji-s5-pro/3458/8

The pp3 is below. There is a crash due to some NaN float value in dirpyr_equalizer.cc. I suspect a division by zero, but that is as far as I can go regarding that code.

GIT version 06137b02be947fc6ee068069fe6e9f4f1f515170 (head of dev). Other info:

Compiler: gcc-6 6.2.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.18.0
Build type: Debug
Build flags: -fdiagnostics-color=always -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -g
Link flags:  -march=native
OpenMP support: ON
MMAP support: ON

_DSF3083.RAF.pp3.txt

agriggio commented 7 years ago

Actually, there are many other operations that make RT crash on this file... just playing with various tools (tried CA correction, wavelet, noise reduction) will trigger a segfault very quickly

heckflosse commented 7 years ago

I tried but can not reproduce the crashes using the raf and the attached pp3. Also tried CA correction etc.. No crash here on Win7/64

agriggio commented 7 years ago

I'll try to rebuild from scratch, with a different compiler

agriggio commented 7 years ago

Tried again with the following:

ompiler: gcc-5 5.4.1
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.18.0
Build type: Release
Build flags: -fdiagnostics-color=always -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags:  -march=native
OpenMP support: ON
MMAP support: ON

The crash is still there. To reproduce:

I get a segfault immediately. I'm attaching the pp3 again. This time it should be complete. _DSF3083.RAF.pp3.txt

heckflosse commented 7 years ago

Still not reproducible on my system :(

agriggio commented 7 years ago

I am on Linux mint 18.1 (KDE version, but not sure if it matters). Maybe you can try on a VM?

Floessie commented 7 years ago

Nor here.

Version: 5.0-r1-gtk3-111-g06137b02
Branch: dev
Commit: 06137b02
Commit date: 2017-03-07
Compiler: cc 6.3.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.22.0
Build type: release
Build flags:  -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags:  -march=native
OpenMP support: ON
MMAP support: ON
agriggio commented 7 years ago

I tried on my laptop at home, and I can't reproduce it either. Must have something to do with specific library version and/or compilation flags (I'm using -march=native on both, but the machines are quite different). I'll dig deeper as soon as possible.

Beep6581 commented 7 years ago

I did not manage to reproduce it.

Version: 5.0-r1-gtk3-113-gdfcab20b
Branch: dev
Commit: dfcab20b
Commit date: 2017-03-08
Compiler: cc 4.9.3
Processor: Intel(R)\ Core(TM)\ m3-6Y30\ CPU\ @\ 0.90GHz
System: Linux
Bit depth: 64 bits
Gtkmm: V3.22.0
Build type: release
Build flags: -std=c++11 -Wno-deprecated-declarations -Wno-unused-result -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags:  -march=native
OpenMP support: ON
MMAP support: ON
agriggio commented 7 years ago

Update. Now I get the crash also at home, when processing the image (from the batch queue). Using the pp3 in attach. I have a Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz bad.pp3.txt

heckflosse commented 7 years ago

I processed the image in queue but it did not crash. Can you provide a bt full please?

agriggio commented 7 years ago

this time it did not crash in the queue, but it did crash when opening a preview window in the editor. Here I only have a release build, but still I got the following bt:

Thread 93 "rawtherapee" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff95602700 (LWP 5582)]
0x0000000000a6ae15 in rtengine::ImProcFunctions::MadRgb(float*, int) [clone .part.38] ()

(gdb) bt full
#0  0x0000000000a6ae15 in rtengine::ImProcFunctions::MadRgb(float*, int) [clone .part.38] ()
No symbol table info available.
#1  0x0000000000a7556b in rtengine::ImProcFunctions::ShrinkAll_info(float**, float**, int, int, int, int, float**, float**, float**, int, int, float, float, rtengine::LabImage*, float&, int&, float&, float&, float&, float&, float&, float&, bool, int, int, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, int&, float&, float&, float&, bool, bool) ()
No symbol table info available.
#2  0x0000000000a77f1e in rtengine::ImProcFunctions::RGB_denoise_info(rtengine::Imagefloat*, rtengine::Imagefloat*, bool, LUT<float>&, float, float, float, rtengine::procparams::DirPyrDenoiseParams const&, double, float&, int&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, bool) ()
No symbol table info available.
#3  0x000000000091a8e9 in rtengine::Crop::update(int) [clone ._omp_fn.0] ()
No symbol table info available.
#4  0x00007ffff2f9f43e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#5  0x00007ffff2b646ba in start_thread (arg=0x7fff95602700)
    at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fff95602700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735699494656, 
                2447316023277093600, 0, 140736027692767, 140735699495360, 
                1, -2447224584365309216, -2447291227138719008}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007ffff289a82d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

I'll try to provide more info tomorrow, from my work machine.

heckflosse commented 7 years ago

@agriggio Though this is already a hint, a bt from debug build would show more :+1:

agriggio commented 7 years ago

A more informative backtrace. Got on:

Compiler: gcc-6 6.2.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.18.0
Build type: Debug
Build flags: -fdiagnostics-color=always -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -g
Link flags:  -march=native
OpenMP support: ON
MMAP support: ON

bt.txt

EDIT: the CPU is an Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz

Floessie commented 7 years ago

@heckflosse Seems to be pretty much the same as #3410.

heckflosse commented 7 years ago

@agriggio Can you reproduce the crash only with this Fuji file? If yes, then it's most likely a bug in the fuji specific code in rawimagesource.cc I'm looking...

agriggio commented 7 years ago

@heckflosse so far, I've seen it only on this file, yes.

agriggio commented 7 years ago

BTW, let me know if I can help in any way

heckflosse commented 7 years ago

I will do. Currently I try to understand the fuji-S specific code...

agriggio commented 7 years ago

@heckflosse I just tried converting the RAF to DNG with the Adobe DNG converter (version 9.8.0.692, used through wine). I get the crash also on the DNG.

heckflosse commented 7 years ago

Maybe the DNG runs through the same code. Can you place a print here to check?

agriggio commented 7 years ago

@heckflosse yes, you are right. I get the print twice before the segfault

Floessie commented 7 years ago

@heckflosse Wouldn't something like this fix the SEGV in that place once and for all?

diff --git a/rtengine/FTblockDN.cc b/rtengine/FTblockDN.cc
index 1ad34f2d..deaf726d 100644
--- a/rtengine/FTblockDN.cc
+++ b/rtengine/FTblockDN.cc
@@ -2149,7 +2149,7 @@ float ImProcFunctions::MadRgb(float * DataList, const int datalen)
     int i;

     for (i = 0; i < datalen; ++i) {
-        histo[min(65535, abs(static_cast<int>(DataList[i])))]++;
+        histo[LIM(65535, 0, abs(static_cast<int>(DataList[i])))]++;
     }

     //find median of histogram
agriggio commented 7 years ago

@Floessie I tried the patch, but I still get a segfault. A different BT now. bt2.txt

Floessie commented 7 years ago

@agriggio Good catch: Now we see the NaN used as index in another place. That might Ingo help to track down the source of it...

heckflosse commented 7 years ago

@agriggio thanks for checking. These super-ccd files are really strange. For example the raw data has almost square dimensions (3583 x 3584 pixels) while the final output is 3:2 (4281 x 2871 pixels). I tend to ignore the crashes atm in case they happen only with this super-ccd files. Here's another file from S5Pro. Can you test whether that crashes also?

Floessie commented 7 years ago

@heckflosse

I tend to ignore the crashes atm in case they happen only with this super-ccd files.

Sure? Not every OOB access is a security issue, but who really knows?

@agriggio Updated patch:

diff --git a/rtengine/FTblockDN.cc b/rtengine/FTblockDN.cc
index 1ad34f2d..deaf726d 100644
--- a/rtengine/FTblockDN.cc
+++ b/rtengine/FTblockDN.cc
@@ -2149,7 +2149,7 @@ float ImProcFunctions::MadRgb(float * DataList, const int datalen)
     int i;

     for (i = 0; i < datalen; ++i) {
-        histo[min(65535, abs(static_cast<int>(DataList[i])))]++;
+        histo[LIM(65535, 0, abs(static_cast<int>(DataList[i])))]++;
     }

     //find median of histogram
diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index d83a431c..72c8d900 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -443,7 +443,7 @@ public:
     template<typename U = T, typename = typename std::enable_if<std::is_same<U, float>::value>::type>
     T operator[](float index) const
     {
-        int idx = (int)index;  // don't use floor! The difference in negative space is no problems here
+        int idx = rtengine::max<int>(0, index);  // ~~don't use floor! The difference in negative space is no problems here~~ It is for NaN!

         if (index < 0.f) {
             if (clip & LUT_CLIP_BELOW) {
agriggio commented 7 years ago

@heckflosse I don't get a segfault with the new RAF. I'll test your patch now

agriggio commented 7 years ago

@Floessie sorry, I noticed now the patch is from you :-)

heckflosse commented 7 years ago

@agriggio Do you also get this wrong left border in the thumb?

heckflosse commented 7 years ago

@Floessie I often thought about fixing this part of code too. But that would be fixing the symptoms, not the source of the evil.

agriggio commented 7 years ago

@Floessie no crash after your new patch :+1:

agriggio commented 7 years ago

@heckflosse yes, I see that too.

Floessie commented 7 years ago

@heckflosse

But that would be fixing the symptoms, not the source of the evil.

Absolutely! But if somebody evil uses RT as his or her attack vector, this won't make up for good PR, especially regarding automated tasks with the new rawtherapee-cli...

agriggio commented 7 years ago

@Floessie I assume it is ok for operator[] to still return NaN if index is NaN, right?

heckflosse commented 7 years ago

@agriggio @Floessie As I wrote above, I also thought about making the checks from above patch. But I did not. If I did, I wouldn't have been able to find the sources of many bugs because they get shadowed by this checks.

Floessie commented 7 years ago

@agriggio

I assume it is ok for operator[] to still return NaN if index is NaN, right?

Yep, so we can catch the next problem.

@heckflosse

But I did not. If I did, I wouldn't have been able to find the sources of many bugs because they get shadowed by this checks.

What about feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW)? Could be turned on in debug mode.

agriggio commented 7 years ago

@heckflosse could putting an assert(isfinite(x)) (or even a weaker !isnan(x)) be acceptable?

heckflosse commented 7 years ago

I will try to find the source of the bug now...

heckflosse commented 7 years ago

@agriggio Isn't assert only active in debug builds?

agriggio commented 7 years ago

@heckflosse yes, that was the point... to ease debugging

Floessie commented 7 years ago

Isn't a "signaling NaN" mode best? No need for isnan() etc. and help from the FPU.

heckflosse commented 7 years ago

@agriggio I'm absolutely not against an assert! But I will try to find the source of the bug because LUT access maybe performance critical and heavily used. At least we should check for impact on performance if we add new checks there.

heckflosse commented 7 years ago

@Floessie wouldn't signaling NaN break the parts of code (especially SSE code) where we allow NaN for intermediate calculations?

agriggio commented 7 years ago

@Floessie trapping NaNs could do, but I understood from @heckflosse that sometimes they are expected. @heckflosse assert is expanded to nothing in release mode, so no performance penalty

Floessie commented 7 years ago

@heckflosse @agriggio Ah, I forgot about that. Sorry for the noise...

heckflosse commented 7 years ago

@agriggio Yes, I know. For that reason I wrote absolutely NOT against :)

Floessie commented 7 years ago

... but we would still be wide open for security issues.

heckflosse commented 7 years ago

@agriggio Let's analyze your last bt full.

I crashes at the access to a LUT because of a NaN. That NaN came from a calculation of rgb values which all are NaN. This rgb values came from the loop above (line 3411 of improcfun.cc). In that loop the rgb values also have been used to access a LUT but we didn't get a crash there. That means the NaN must be inisde the hltonecurve LUT which is filled in line 546 ff of curves.cc

heckflosse commented 7 years ago

My conclusion from last post was incomplete. Still searching