Closed agriggio closed 7 years ago
Actually, there are many other operations that make RT crash on this file... just playing with various tools (tried CA correction, wavelet, noise reduction) will trigger a segfault very quickly
I tried but can not reproduce the crashes using the raf and the attached pp3. Also tried CA correction etc.. No crash here on Win7/64
I'll try to rebuild from scratch, with a different compiler
Tried again with the following:
ompiler: gcc-5 5.4.1
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.18.0
Build type: Release
Build flags: -fdiagnostics-color=always -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags: -march=native
OpenMP support: ON
MMAP support: ON
The crash is still there. To reproduce:
delete the pp3 sidecar of the RAF file, and make sure that the default processing profile for RAW files is "(Neutral)" (just to be safe)
open the RAF file with RT
enable "Contrast by detail levels"
close RT
reopen RT in "file browser mode"
click on the thumbnail of the RAF
I get a segfault immediately. I'm attaching the pp3 again. This time it should be complete. _DSF3083.RAF.pp3.txt
Still not reproducible on my system :(
I am on Linux mint 18.1 (KDE version, but not sure if it matters). Maybe you can try on a VM?
Nor here.
Version: 5.0-r1-gtk3-111-g06137b02
Branch: dev
Commit: 06137b02
Commit date: 2017-03-07
Compiler: cc 6.3.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.22.0
Build type: release
Build flags: -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags: -march=native
OpenMP support: ON
MMAP support: ON
I tried on my laptop at home, and I can't reproduce it either. Must have something to do with specific library version and/or compilation flags (I'm using -march=native on both, but the machines are quite different). I'll dig deeper as soon as possible.
I did not manage to reproduce it.
Version: 5.0-r1-gtk3-113-gdfcab20b
Branch: dev
Commit: dfcab20b
Commit date: 2017-03-08
Compiler: cc 4.9.3
Processor: Intel(R)\ Core(TM)\ m3-6Y30\ CPU\ @\ 0.90GHz
System: Linux
Bit depth: 64 bits
Gtkmm: V3.22.0
Build type: release
Build flags: -std=c++11 -Wno-deprecated-declarations -Wno-unused-result -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG
Link flags: -march=native
OpenMP support: ON
MMAP support: ON
Update. Now I get the crash also at home, when processing the image (from the batch queue). Using the pp3 in attach.
I have a Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz
bad.pp3.txt
I processed the image in queue but it did not crash. Can you provide a bt full
please?
this time it did not crash in the queue, but it did crash when opening a preview window in the editor. Here I only have a release build, but still I got the following bt:
Thread 93 "rawtherapee" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff95602700 (LWP 5582)]
0x0000000000a6ae15 in rtengine::ImProcFunctions::MadRgb(float*, int) [clone .part.38] ()
(gdb) bt full
#0 0x0000000000a6ae15 in rtengine::ImProcFunctions::MadRgb(float*, int) [clone .part.38] ()
No symbol table info available.
#1 0x0000000000a7556b in rtengine::ImProcFunctions::ShrinkAll_info(float**, float**, int, int, int, int, float**, float**, float**, int, int, float, float, rtengine::LabImage*, float&, int&, float&, float&, float&, float&, float&, float&, bool, int, int, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, int&, float&, float&, float&, bool, bool) ()
No symbol table info available.
#2 0x0000000000a77f1e in rtengine::ImProcFunctions::RGB_denoise_info(rtengine::Imagefloat*, rtengine::Imagefloat*, bool, LUT<float>&, float, float, float, rtengine::procparams::DirPyrDenoiseParams const&, double, float&, int&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, float&, bool) ()
No symbol table info available.
#3 0x000000000091a8e9 in rtengine::Crop::update(int) [clone ._omp_fn.0] ()
No symbol table info available.
#4 0x00007ffff2f9f43e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#5 0x00007ffff2b646ba in start_thread (arg=0x7fff95602700)
at pthread_create.c:333
__res = <optimized out>
pd = 0x7fff95602700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735699494656,
2447316023277093600, 0, 140736027692767, 140735699495360,
1, -2447224584365309216, -2447291227138719008},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0},
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#6 0x00007ffff289a82d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
I'll try to provide more info tomorrow, from my work machine.
@agriggio Though this is already a hint, a bt from debug build would show more :+1:
A more informative backtrace. Got on:
Compiler: gcc-6 6.2.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.18.0
Build type: Debug
Build flags: -fdiagnostics-color=always -std=c++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -g
Link flags: -march=native
OpenMP support: ON
MMAP support: ON
EDIT: the CPU is an Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
@heckflosse Seems to be pretty much the same as #3410.
@agriggio Can you reproduce the crash only with this Fuji file? If yes, then it's most likely a bug in the fuji specific code in rawimagesource.cc I'm looking...
@heckflosse so far, I've seen it only on this file, yes.
BTW, let me know if I can help in any way
I will do. Currently I try to understand the fuji-S specific code...
@heckflosse I just tried converting the RAF to DNG with the Adobe DNG converter (version 9.8.0.692, used through wine). I get the crash also on the DNG.
Maybe the DNG runs through the same code. Can you place a print here to check?
@heckflosse yes, you are right. I get the print twice before the segfault
@heckflosse Wouldn't something like this fix the SEGV in that place once and for all?
diff --git a/rtengine/FTblockDN.cc b/rtengine/FTblockDN.cc
index 1ad34f2d..deaf726d 100644
--- a/rtengine/FTblockDN.cc
+++ b/rtengine/FTblockDN.cc
@@ -2149,7 +2149,7 @@ float ImProcFunctions::MadRgb(float * DataList, const int datalen)
int i;
for (i = 0; i < datalen; ++i) {
- histo[min(65535, abs(static_cast<int>(DataList[i])))]++;
+ histo[LIM(65535, 0, abs(static_cast<int>(DataList[i])))]++;
}
//find median of histogram
@Floessie I tried the patch, but I still get a segfault. A different BT now. bt2.txt
@agriggio Good catch: Now we see the NaN used as index in another place. That might Ingo help to track down the source of it...
@agriggio thanks for checking. These super-ccd files are really strange. For example the raw data has almost square dimensions (3583 x 3584 pixels) while the final output is 3:2 (4281 x 2871 pixels). I tend to ignore the crashes atm in case they happen only with this super-ccd files. Here's another file from S5Pro. Can you test whether that crashes also?
@heckflosse
I tend to ignore the crashes atm in case they happen only with this super-ccd files.
Sure? Not every OOB access is a security issue, but who really knows?
@agriggio Updated patch:
diff --git a/rtengine/FTblockDN.cc b/rtengine/FTblockDN.cc
index 1ad34f2d..deaf726d 100644
--- a/rtengine/FTblockDN.cc
+++ b/rtengine/FTblockDN.cc
@@ -2149,7 +2149,7 @@ float ImProcFunctions::MadRgb(float * DataList, const int datalen)
int i;
for (i = 0; i < datalen; ++i) {
- histo[min(65535, abs(static_cast<int>(DataList[i])))]++;
+ histo[LIM(65535, 0, abs(static_cast<int>(DataList[i])))]++;
}
//find median of histogram
diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index d83a431c..72c8d900 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -443,7 +443,7 @@ public:
template<typename U = T, typename = typename std::enable_if<std::is_same<U, float>::value>::type>
T operator[](float index) const
{
- int idx = (int)index; // don't use floor! The difference in negative space is no problems here
+ int idx = rtengine::max<int>(0, index); // ~~don't use floor! The difference in negative space is no problems here~~ It is for NaN!
if (index < 0.f) {
if (clip & LUT_CLIP_BELOW) {
@heckflosse I don't get a segfault with the new RAF. I'll test your patch now
@Floessie sorry, I noticed now the patch is from you :-)
@agriggio Do you also get this wrong left border in the thumb?
@Floessie I often thought about fixing this part of code too. But that would be fixing the symptoms, not the source of the evil.
@Floessie no crash after your new patch :+1:
@heckflosse yes, I see that too.
@heckflosse
But that would be fixing the symptoms, not the source of the evil.
Absolutely! But if somebody evil uses RT as his or her attack vector, this won't make up for good PR, especially regarding automated tasks with the new rawtherapee-cli
...
@Floessie I assume it is ok for operator[]
to still return NaN if index
is NaN, right?
@agriggio @Floessie As I wrote above, I also thought about making the checks from above patch. But I did not. If I did, I wouldn't have been able to find the sources of many bugs because they get shadowed by this checks.
@agriggio
I assume it is ok for operator[] to still return NaN if index is NaN, right?
Yep, so we can catch the next problem.
@heckflosse
But I did not. If I did, I wouldn't have been able to find the sources of many bugs because they get shadowed by this checks.
What about feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW)
? Could be turned on in debug mode.
@heckflosse could putting an assert(isfinite(x))
(or even a weaker !isnan(x)
) be acceptable?
I will try to find the source of the bug now...
@agriggio Isn't assert only active in debug builds?
@heckflosse yes, that was the point... to ease debugging
Isn't a "signaling NaN" mode best? No need for isnan()
etc. and help from the FPU.
@agriggio I'm absolutely not against an assert! But I will try to find the source of the bug because LUT access maybe performance critical and heavily used. At least we should check for impact on performance if we add new checks there.
@Floessie wouldn't signaling NaN break the parts of code (especially SSE code) where we allow NaN for intermediate calculations?
@Floessie trapping NaNs could do, but I understood from @heckflosse that sometimes they are expected.
@heckflosse assert
is expanded to nothing in release mode, so no performance penalty
@heckflosse @agriggio Ah, I forgot about that. Sorry for the noise...
@agriggio Yes, I know. For that reason I wrote absolutely NOT against
:)
... but we would still be wide open for security issues.
@agriggio Let's analyze your last bt full.
I crashes at the access to a LUT because of a NaN. That NaN came from a calculation of rgb values which all are NaN. This rgb values came from the loop above (line 3411 of improcfun.cc). In that loop the rgb values also have been used to access a LUT but we didn't get a crash there. That means the NaN must be inisde the hltonecurve LUT which is filled in line 546 ff of curves.cc
My conclusion from last post was incomplete. Still searching
The RAF file can be taken from here: https://discuss.pixls.us/t/rawtherapee-for-raf-files-of-fuji-s5-pro/3458/8
The pp3 is below. There is a crash due to some NaN float value in
dirpyr_equalizer.cc
. I suspect a division by zero, but that is as far as I can go regarding that code.GIT version
06137b02be947fc6ee068069fe6e9f4f1f515170
(head ofdev
). Other info:_DSF3083.RAF.pp3.txt