Closed Llerr closed 6 years ago
This is caused by removing the __x86_64__
dependencies in 96bf06a.
@Floessie Is there an other solution than adding the dependencies again?
@heckflosse Don't we have those ALIGN
macros? Can't they be added to the vfloat
define?
@Floessie They are used in LUT.h, but I remember that on non __x86_64__
builds they didn't work in all cases. E.g. they align 16 bytes relative to the stack, which may be unaligned. For this reason we had to use -mstackrealign
for 32 bit builds in past. But in case of LUT.h vfloat members, that also didn't work on non __x86_64__
builds for whatever reason
@heckflosse Well, I don't have that insight, but why might the stack be unaligned when GCC defaults to -mpreferred-stack-boundary=4
?
@Floessie iirc
vfloat a ALIGNED16;
a = _mm_setzero_ps();
where a
is a member of LUT
generates an aligned store _mm_store_ps()
, which crashes if memory is not aligned to 16 byte and for whatever reason this never worked on non __x86_64__
for the vfloat members in LUT.h though you are right that it should work.
@Floessie Can you test this on a non __x86_64__
build?
#include "LUT.h"
LUTf test(65536);
std::cout << alignof(test) << std::endl;
On __x86_64__
the output is 16
.
@Floessie In case on i686
builds the output of the test code from above is 8
this patch probably fixes the bug:
diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index abc2b0bd..5824fd01 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -88,7 +88,7 @@ using LUTd = LUT<double>;
using LUTuc = LUT<uint8_t>;
template<typename T>
-class LUT :
+class alignas(16) LUT :
public rtengine::NonCopyable
{
protected:
@Llerr I pushed a possible fix to dev
branch. Can you please test?
@heckflosse It reports 16
with GCC 7.3 on i686, I'm afraid.
At least on W10 32 bits no improvment
Thread 1 received signal SIGSEGV, Segmentation fault.
0x005ae100 in LUT<float>::LUT (this=0xecff7a8)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
177 maxsv = ZEROV;
(gdb) bt full
#0 0x005ae100 in LUT<float>::LUT (this=0xecff7a8)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
No locals.
#1 MyCurve::MyCurve (this=0xecff6f8,
__vtt_parm=0xbbc0a0 <VTT for MyDiagonalCurve+4>,
__in_chrg=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mycurve.cc:24
No locals.
#2 0x005af5bc in MyDiagonalCurve::MyDiagonalCurve (this=0xecff6f8,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mydiagonalcurve.cc:24
No locals.
#3 0x0049ab14 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (
this=0xecf15b8, prt=<optimized out>, curveDir=...)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:133
evdarks = <optimized out>
evshadows = <optimized out>
paramCurveSliderBox = <optimized out>
sideEnd = Gtk::POS_BOTTOM
custombbox = <optimized out>
NURBSbbox = <optimized out>
parambbox = <optimized out>
evhighlights = <optimized out>
evlights = <optimized out>
#4 0x0048fd36 in CurveEditorGroup::addCurve (this=this@entry=0xecf1420,
cType=cType@entry=CT_Diagonal, curveLabel=...,
relatedWidget=relatedWidget@entry=0xecde378,
expandRelatedWidget=expandRelatedWidget@entry=true,
periodic=periodic@entry=true)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/curveeditorgroup.cc:82
newCE = <optimized out>
#5 0x006bb62b in ToneCurve::ToneCurve (this=0xeb4f748,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/tonecurve.cc:147
m = <optimized out>
bottomMilestones = {<std::_Vector_base<GradientMilestone, std::allocator<GradientMilestone> >> = {
_M_impl = {<std::allocator<GradientMilestone>> = {<__gnu_cxx::new_allocator<GradientMilestone>> = {<No data fields>}, <No data fields>},
_M_start = 0xec420d8, _M_finish = 0xec42128,
_M_end_of_storage = 0xec42128}}, <No data fields>}
lab = <optimized out>
#6 0x006c4322 in ToolPanelCoordinator::ToolPanelCoordinator (this=0xeb49388,
batch=true)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/toolpanelcoord.cc:42
type = <optimized out>
#7 0x0041700b in BatchToolPanelCoordinator::BatchToolPanelCoordinator (
this=0xeb49388, parent=0xeae1f80)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/batchtoolpanelcoord.cc:28
No locals.
#8 0x0052b278 in FilePanel::FilePanel (this=0xeae1f80,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/filepanel.cc:55
devLab = <optimized out>
exportLab = <optimized out>
dirSelected = {<sigc::signal2<void, Glib::ustring const&, Glib::ustring const&, sigc::nil>> = {<sigc::signal_base> = {<sigc::trackable> = {
callback_list_ = 0x76f4da33 <ntdll!RtlReAllocateHeap+67>},
impl_ = 0xeaa72d0}, <No data fields>}, <No data fields>}
sFilterPanel = <optimized out>
sExportPanel = <optimized out>
inspectLab = <optimized out>
filtLab = <optimized out>
#9 0x006816c1 in RTWindow::RTWindow (this=0xebe08e8,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/rtwindow.cc:182
lbq = <optimized out>
actionGrid = <optimized out>
fpanelLabelGrid = <optimized out>
fpl = <optimized out>
preferences = <optimized out>
lMonitorRect = {gobject_ = {x = 0, y = 0, width = 1280,
height = 1024}}
#10 0x005a8742 in (anonymous namespace)::create_rt_window ()
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:348
icon_path = {static npos = 4294967295, string_ = {
static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0xea91cb8 "D:\\rawtherapee\\rtinstall\\devrelease32\\.\\images"}, _M_string_length = 46, {
_M_local_buf = ".\000\000\000c\001\000P\000\000\000\000\060\032▒\016", _M_allocated_capacity = 46}}}
defaultIconTheme = {pCppObject_ = 0xc94a2d0}
screen = {pCppObject_ = 0xeaa1b28}
rtWindow = <optimized out>
#11 0x005aa584 in main (argc=<optimized out>, argv=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:659
m = <incomplete type>
rtWindow = <optimized out>
exname = "D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
exePath = {static npos = 4294967295, string_ = {
static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0xc94fbb8 "D:\\rawtherapee\\rtinstall\\devrelease32"},
_M_string_length = 37, {
_M_local_buf = "%\000\000\000\005\000\004\001\230C▒\fh▒▒\a",
_M_allocated_capacity = 37}}}
exnameU = L"D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
consoleOpened = <optimized out>
ret = 0
(gdb)
@gaaned92 André, could you also test Ingo's patch with the class alignas(16) LUT
above?
@Llerr I pushed a possible fix to dev branch. Can you please test? What branch is use? 5.4-rc2 Yes, but only 12.03.2018. I install 5.3 from repositary, all work. But when i compile it`s crashed.
@Llerr after cloning, do git checkout dev
, then do git describe
to make sure you're at least on 5.3-669-g478e1c05
, then compile. In fact, paste the output of git describe
here.
More info: http://rawpedia.rawtherapee.com/Linux#Choose_a_branch
@Floessie I applied the patch but with no succes
[Thread 11368.0x2ab0 exited with code 0]
Thread 1 received signal SIGSEGV, Segmentation fault.
0x005ae100 in LUT<float>::LUT (this=0xec2d788)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
177 maxsv = ZEROV;
(gdb) bt full
#0 0x005ae100 in LUT<float>::LUT (this=0xec2d788)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtengine/LUT.h:177
No locals.
#1 MyCurve::MyCurve (this=0xec2d6d8,
__vtt_parm=0xbbc0a0 <VTT for MyDiagonalCurve+4>,
__in_chrg=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mycurve.cc:24
No locals.
#2 0x005af5bc in MyDiagonalCurve::MyDiagonalCurve (this=0xec2d6d8,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/mydiagonalcurve.cc:24
No locals.
#3 0x0049ba91 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (
this=0xec042e0, prt=<optimized out>, curveDir=...)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:210
evdarks = <optimized out>
evshadows = <optimized out>
paramCurveSliderBox = <optimized out>
sideEnd = Gtk::POS_BOTTOM
custombbox = <optimized out>
NURBSbbox = <optimized out>
parambbox = <optimized out>
evhighlights = <optimized out>
evlights = <optimized out>
#4 0x0048fd36 in CurveEditorGroup::addCurve (this=this@entry=0xec04148,
cType=cType@entry=CT_Diagonal, curveLabel=...,
relatedWidget=relatedWidget@entry=0xec010c0,
expandRelatedWidget=expandRelatedWidget@entry=true,
periodic=periodic@entry=true)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/curveeditorgroup.cc:82
newCE = <optimized out>
#5 0x006bb62b in ToneCurve::ToneCurve (this=0xeaf0658,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/tonecurve.cc:147
m = <optimized out>
bottomMilestones = {<std::_Vector_base<GradientMilestone, std::allocator<GradientMilestone> >> = {
_M_impl = {<std::allocator<GradientMilestone>> = {<__gnu_cxx::new_allocator<GradientMilestone>> = {<No data fields>}, <No data fields>},
_M_start = 0xeb64d60, _M_finish = 0xeb64db0,
_M_end_of_storage = 0xeb64db0}}, <No data fields>}
lab = <optimized out>
#6 0x006c4322 in ToolPanelCoordinator::ToolPanelCoordinator (this=0xeae9858,
batch=true)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/toolpanelcoord.cc:42
type = <optimized out>
#7 0x0041700b in BatchToolPanelCoordinator::BatchToolPanelCoordinator (
this=0xeae9858, parent=0xea1c130)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/batchtoolpanelcoord.cc:28
No locals.
#8 0x0052b278 in FilePanel::FilePanel (this=0xea1c130,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/filepanel.cc:55
devLab = <optimized out>
exportLab = <optimized out>
dirSelected = {<sigc::signal2<void, Glib::ustring const&, Glib::ustring const&, sigc::nil>> = {<sigc::signal_base> = {<sigc::trackable> = {
callback_list_ = 0x76f4da33 <ntdll!RtlReAllocateHeap+67>},
impl_ = 0xe9d6e78}, <No data fields>}, <No data fields>}
sFilterPanel = <optimized out>
sExportPanel = <optimized out>
inspectLab = <optimized out>
filtLab = <optimized out>
#9 0x006816c1 in RTWindow::RTWindow (this=0xe9f5838,
__in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/rtwindow.cc:182
lbq = <optimized out>
actionGrid = <optimized out>
fpanelLabelGrid = <optimized out>
fpl = <optimized out>
preferences = <optimized out>
lMonitorRect = {gobject_ = {x = 0, y = 0, width = 1280,
height = 1024}}
#10 0x005a8742 in (anonymous namespace)::create_rt_window ()
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:348
icon_path = {static npos = 4294967295, string_ = {
static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0xe9b4400 "D:\\rawtherapee\\rtinstall\\devrelease32\\.\\images"}, _M_string_length = 46, {
_M_local_buf = ".\000\000\000c\001\000P\000\000\000\000xA▒\016", _M_allocated_capacity = 46}}}
defaultIconTheme = {pCppObject_ = 0xea46bb0}
screen = {pCppObject_ = 0xea5db00}
rtWindow = <optimized out>
#11 0x005aa584 in main (argc=<optimized out>, argv=<optimized out>)
at D:/RAWTHERAPEE/RTSOURCE/rawtherapee/rtgui/main.cc:659
m = <incomplete type>
rtWindow = <optimized out>
exname = "D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
exePath = {static npos = 4294967295, string_ = {
static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0xc5af940 "D:\\rawtherapee\\rtinstall\\devrelease32"},
_M_string_length = 37, {
_M_local_buf = "%\000\000\000▒\023\004▒▒▒\\\fh▒▒\a",
_M_allocated_capacity = 37}}}
exnameU = L"D:\\rawtherapee\\rtinstall\\devrelease32\\rawtherapee-debug.exe", '\000' <repeats 452 times>
consoleOpened = <optimized out>
ret = 0
(gdb)
beginning of LUT.h
#ifndef LUT_H_
#define LUT_H_
#include <cstring>
#include <cstdint>
#include <cassert>
#ifndef NDEBUG
#include <glibmm.h>
#include <fstream>
#endif
#include "opthelper.h"
#include "rt_math.h"
#include "noncopyable.h"
// Bit representations of flags
enum {
LUT_CLIP_BELOW = 1 << 0,
LUT_CLIP_ABOVE = 1 << 1
};
template<typename T>
class LUT;
using LUTf = LUT<float>;
using LUTi = LUT<int32_t>;
using LUTu = LUT<uint32_t>;
using LUTd = LUT<double>;
using LUTuc = LUT<uint8_t>;
template<typename T>
class alignas(16) LUT :
public rtengine::NonCopyable
{
protected:
// list of variables ordered to improve cache speed
int maxs;
float maxsf;
T * data;
unsigned int clip;
unsigned int size;
unsigned int upperBound; // always equals size-1, parameter created for performance reason
private:
unsigned int owner;
#ifdef __SSE2__
alignas(16) vfloat maxsv;
alignas(16) vfloat sizev;
alignas(16) vint sizeiv;
#endif
public:
/// convenience flag! If one doesn't want to delete the buffer but want to flag it to be recomputed...
/// The user have to handle it itself, even if some method can (re)initialize it
bool dirty;
This patch for dev
should force unaligned access on vfloat/vint members of LUT when building with SSE2 support for non x86/64 targets:
diff --git a/rtengine/LUT.h b/rtengine/LUT.h
index abc2b0bd..709bf860 100644
--- a/rtengine/LUT.h
+++ b/rtengine/LUT.h
@@ -133,9 +133,15 @@ public:
maxs = size - 2;
maxsf = (float)maxs;
#ifdef __SSE2__
+#ifdef __x86_64__
maxsv = F2V( maxs );
sizeiv = _mm_set1_epi32( (int)(size - 1) );
sizev = F2V( size - 1 );
+#else
+ _mm_storeu_ps((float*)&maxsv, F2V( maxs ));
+ _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+ _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+#endif
#endif
}
void operator ()(int s, int flags = 0xfffffff)
@@ -163,9 +169,16 @@ public:
maxs = size - 2;
maxsf = (float)maxs;
#ifdef __SSE2__
+#ifdef __x86_64__
maxsv = F2V( maxs );
sizeiv = _mm_set1_epi32( (int)(size - 1) );
sizev = F2V( size - 1 );
+#else
+ _mm_storeu_ps((float*)&maxsv, F2V( maxs ));
+ _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+ _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+
+#endif
#endif
}
@@ -174,9 +187,15 @@ public:
data = nullptr;
reset();
#ifdef __SSE2__
+#ifdef __x86_64__
maxsv = ZEROV;
sizev = ZEROV;
sizeiv = _mm_setzero_si128();
+#else
+ _mm_storeu_ps((float*)&maxsv, ZEROV);
+ _mm_storeu_ps((float*)&sizev, ZEROV);
+ _mm_storeu_si128(&sizeiv, _mm_setzero_si128());
+#endif
#endif
}
@@ -238,9 +257,15 @@ public:
this->maxs = this->size - 2;
this->maxsf = (float)this->maxs;
#ifdef __SSE2__
+#ifdef __x86_64__
this->maxsv = F2V( this->size - 2);
this->sizeiv = _mm_set1_epi32( (int)(this->size - 1) );
this->sizev = F2V( this->size - 1 );
+#else
+ _mm_storeu_ps((float*)&this->maxsv, F2V( this->size - 2));
+ _mm_storeu_si128(&this->sizeiv, _mm_set1_epi32( (int)(this->size - 1) ));
+ _mm_storeu_ps((float*)&this->sizev, F2V( this->size - 1 ));
+#endif
#endif
}
@@ -311,7 +336,11 @@ public:
// Clamp and convert to integer values. Extract out of SSE register because all
// lookup operations use regular addresses.
+#ifdef __x86_64__
vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+ vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
vint indexes = _mm_cvttps_epi32(clampedIndexes);
int indexArray[4];
_mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -343,7 +372,11 @@ public:
// Clamp and convert to integer values. Extract out of SSE register because all
// lookup operations use regular addresses.
+#ifdef __x86_64__
vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+ vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
vint indexes = _mm_cvttps_epi32(clampedIndexes);
int indexArray[4];
_mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -363,7 +396,11 @@ public:
vfloat lower = _mm_castsi128_ps(_mm_unpacklo_epi64(temp0, temp1));
vfloat upper = _mm_castsi128_ps(_mm_unpackhi_epi64(temp0, temp1));
+#ifdef __x86_64__
vfloat diff = vmaxf(ZEROV, vminf(sizev, indexv)) - _mm_cvtepi32_ps(indexes);
+#else
+ vfloat diff = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&sizev), indexv)) - _mm_cvtepi32_ps(indexes);
+#endif
return vintpf(diff, upper, lower);
}
@@ -374,7 +411,11 @@ public:
// Clamp and convert to integer values. Extract out of SSE register because all
// lookup operations use regular addresses.
+#ifdef __x86_64__
vfloat clampedIndexes = vmaxf(ZEROV, vminf(maxsv, indexv));
+#else
+ vfloat clampedIndexes = vmaxf(ZEROV, vminf(_mm_loadu_ps((float*)&maxsv), indexv));
+#endif
vint indexes = _mm_cvttps_epi32(clampedIndexes);
int indexArray[4];
_mm_storeu_si128(reinterpret_cast<__m128i*>(&indexArray[0]), indexes);
@@ -402,7 +443,11 @@ public:
vfloat operator[](vint idxv ) const
{
vfloat tempv, p1v;
+#ifdef __x86_64__
idxv = _mm_max_epi32( _mm_setzero_si128(), _mm_min_epi32(idxv, sizeiv));
+#else
+ idxv = _mm_max_epi32( _mm_setzero_si128(), _mm_min_epi32(idxv, _mm_loadu_si128((__m128i*)&sizeiv)));
+#endif
// access the LUT 4 times and shuffle the values into p1v
int idx;
@@ -443,7 +488,11 @@ public:
{
vfloat tempv, p1v;
tempv = _mm_cvtepi32_ps(idxv);
+#ifdef __x86_64__
tempv = _mm_min_ps( tempv, sizev );
+#else
+ tempv = _mm_min_ps( tempv, _mm_loadu_ps((float*)&sizev ));
+#endif
idxv = _mm_cvttps_epi32(_mm_max_ps( tempv, _mm_setzero_ps( ) ));
// access the LUT 4 times and shuffle the values into p1v
@@ -691,9 +740,15 @@ public:
maxs = size - 2;
maxsf = (float)maxs;
#ifdef __SSE2__
+#ifdef __x86_64__
maxsv = F2V( size - 2);
sizeiv = _mm_set1_epi32( (int)(size - 1) );
sizev = F2V( size - 1 );
+#else
+ _mm_storeu_ps((float*)&maxsv, F2V( size - 2));
+ _mm_storeu_si128(&sizeiv, _mm_set1_epi32( (int)(size - 1) ));
+ _mm_storeu_ps((float*)&sizev, F2V( size - 1 ));
+#endif
#endif
}
@heckflosse Isn't unaligned heap our problem here, given the fact, that this MyDiagonalCurve
is allocated via new
?
@Floessie Yes, I think that's the problem.
@heckflosse This is just a PoC (call it a dirty hack):
diff --git a/rtgui/main.cc b/rtgui/main.cc
index 2ae442e5a..34e7fe68a 100644
--- a/rtgui/main.cc
+++ b/rtgui/main.cc
@@ -51,6 +51,18 @@
#include "conio.h"
#endif
+#include <stdlib.h>
+
+void* operator new(decltype(sizeof(0)) n) noexcept(false)
+{
+ return aligned_alloc(16, n);
+}
+
+void operator delete(void* p) noexcept
+{
+ free(p);
+}
+
// Set this to 1 to make RT work when started with Eclipse and arguments, at least on Windows platform
#define ECLIPSE_ARGS 0
This requires C11.
If you want I test something, ping me.
@gaaned92 Ping! :smile: Could you test my dirty hack, so we know if it helps at all?
@Floessie I tried your patch on msys2/mingw64.
There is no aligned_alloc()
, so I used the suggested _aligned_malloc()
.
That compiles, but segfaults.
Then I also used _aligned_free()
instead of free()
Still segfaults.
I even tried _aligned_malloc(16, n + 16 - ((n%16) ? n&16 : 16) );
because of this
Still segfaults :(
Edit: I changed _aligned_malloc(16, n)
to _aligned_malloc(n,16)
That terminates rt immediately after start without a segfault :(
That's a pity. Works on Linux amd64 and i686. But I don't have real hardware that would SEGV on unaligned SSE accesses, so I can't test...
@Floessie afaik all cpus fail on aligned SSE access
on unaligned memory
@gaaned92 Ping for testing my patch
@heckflosse with your patch for build = debug ok
for build = release crash
Thread 1 received signal SIGSEGV, Segmentation fault.
0x005bb0be in ?? ()
(gdb) bt full
#0 0x005bb0be in ?? ()
No symbol table info available.
#1 0x005bcd7e in ?? ()
No symbol table info available.
#2 0x004a3c83 in ?? ()
No symbol table info available.
#3 0x0049669f in ?? ()
No symbol table info available.
#4 0x006cc8ca in ?? ()
No symbol table info available.
#5 0x006d47fd in ?? ()
No symbol table info available.
#6 0x0041c09a in ?? ()
No symbol table info available.
#7 0x0053a5aa in ?? ()
No symbol table info available.
#8 0x00689c10 in ?? ()
No symbol table info available.
#9 0x005b7b8d in ?? ()
No symbol table info available.
#10 0x00b260af in ?? ()
No symbol table info available.
#11 0x004013cb in ?? ()
No symbol table info available.
#12 0x73c48654 in KERNEL32!BaseThreadInitThunk ()
from C:\WINDOWS\System32\kernel32.dll
No symbol table info available.
#13 0x76f74a77 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#14 0x76f74a47 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#15 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Not a lot of information here!
@gaaned92 Thanks a lot for testing. Because the debug build does not crash, I guess the bug with sse access to unaligned data in LUT.h is fixed with my patch and the crash in release build is a different one (maybe specific to Win32 builds as now all the SSEFUNCTION macros, which were needed for Win32 SSE builds, are removed). But that's just a guess.
Nevertheless it would be good to know how a release build on i686 Linux behaves with the patch @Llerr
@Beep6581
$ git describe
5.3-680-g8f9fb2d
(gdb) run
Starting program: /opt/rawtherapee/bin/rawtherapee
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb53fdb40 (LWP 7280)]
[New Thread 0xb4bfcb40 (LWP 7281)]
[New Thread 0xb43fbb40 (LWP 7282)]
[New Thread 0xb3bfab40 (LWP 7283)]
[New Thread 0xb33f9b40 (LWP 7284)]
[New Thread 0xb2bf8b40 (LWP 7285)]
[New Thread 0xb23f7b40 (LWP 7286)]
[New Thread 0xb12ffb40 (LWP 7287)]
[New Thread 0xb0afeb40 (LWP 7288)]
Thread 1 "rawtherapee" received signal SIGSEGV, Segmentation fault.
0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
177 maxsv = ZEROV;
(gdb) where
#0 0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
#1 0x0867fb00 in MyCurve::MyCurve (this=0x9417348, __vtt_parm=0x8da988c <VTT for MyDiagonalCurve+4>, __in_chrg=<optimized out>)
at /home/lex/programs/code-rawtherapee/rtgui/mycurve.cc:24
#2 0x0868131e in MyDiagonalCurve::MyDiagonalCurve (this=0x9417348, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at /home/lex/programs/code-rawtherapee/rtgui/mydiagonalcurve.cc:24
#3 0x084eea97 in DiagonalCurveEditorSubGroup::DiagonalCurveEditorSubGroup (this=0x9417270, prt=0x94164b8, curveDir=...)
at /home/lex/programs/code-rawtherapee/rtgui/diagonalcurveeditorsubgroup.cc:56
#4 0x084e3468 in CurveEditorGroup::addCurve (this=0x94164b8, cType=CT_Diagonal, curveLabel=..., relatedWidget=0x940f8f0,
expandRelatedWidget=true, periodic=true) at /home/lex/programs/code-rawtherapee/rtgui/curveeditorgroup.cc:82
#5 0x087e34ce in ToneCurve::ToneCurve (this=0x9388c18, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at /home/lex/programs/code-rawtherapee/rtgui/tonecurve.cc:147
#6 0x087f222c in ToolPanelCoordinator::ToolPanelCoordinator (this=0x9393c00, batch=true)
at /home/lex/programs/code-rawtherapee/rtgui/toolpanelcoord.cc:42
#7 0x084321ab in BatchToolPanelCoordinator::BatchToolPanelCoordinator (this=0x9393c00, parent=0x9294000)
at /home/lex/programs/code-rawtherapee/rtgui/batchtoolpanelcoord.cc:28
#8 0x085c2abb in FilePanel::FilePanel (this=0x9294000, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at /home/lex/programs/code-rawtherapee/rtgui/filepanel.cc:55
#9 0x08793b42 in RTWindow::RTWindow (this=0x92da278, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
at /home/lex/programs/code-rawtherapee/rtgui/rtwindow.cc:182
#10 0x086764c3 in (anonymous namespace)::create_rt_window () at /home/lex/programs/code-rawtherapee/rtgui/main.cc:348
#11 0x08677a80 in main (argc=1, argv=0xbfffef74) at /home/lex/programs/code-rawtherapee/rtgui/main.cc:658
it`s crushed :(
$ lscpu
Архитектура:i686
CPU op-mode(s): 32-bit, 64-bit
Порядок байтов:Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Потоков на ядро:2
Ядер на сокет:4
Сокет(ы): 1
Vendor ID: GenuineIntel
Семейство CPU:6
Модель: 58
Model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping: 9
CPU МГц: 2193.132
CPU max MHz: 3900,0000
CPU min MHz: 1600,0000
BogoMIPS: 6784.23
Виртуализация:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb retpoline tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
@heckflosse
afaik all cpus fail on aligned SSE access on unaligned memory
Tested again with latest dev
in a Debian Testing i686 QEMU VM: Doesn't crash. There seems to be a difference between hardware and emulation...
@Floessie If you want to know about your emulation handling things different, the following code should crash on all systems:
float test[8] = { 1.f };
for(int i = 0; i < 4; ++i) {
_mm_store_ps(&test[i], _mm_load_ps(&test[i+1]) + _mm_set1_ps(1.f));
}
printf("%f\n",test[3]);
@heckflosse Proof: My old QEMU 2.1.2 doesn't crash on the snippet above, but happily prints out 2.000000
. :disappointed:
@Floessie Damn. My example was so simple that the compiler most likely could predict the output. Here's a working (crashing) example:
diff --git a/rtengine/rawimagesource.cc b/rtengine/rawimagesource.cc
index 0420c5b7..8dd30b05 100644
--- a/rtengine/rawimagesource.cc
+++ b/rtengine/rawimagesource.cc
@@ -1520,6 +1520,10 @@ void RawImageSource::vflip (Imagefloat* image)
int RawImageSource::load (const Glib::ustring &fname)
{
+float test[8] = { 1.f };
+LFModifier::test(test);
+std::cout << test[3] << std::endl;
+
MyTime t1, t2;
t1.set();
fileName = fname;
diff --git a/rtengine/rtlensfun.cc b/rtengine/rtlensfun.cc
index 8216f0b8..ebd8605e 100644
--- a/rtengine/rtlensfun.cc
+++ b/rtengine/rtlensfun.cc
@@ -43,6 +43,11 @@ LFModifier::operator bool() const
return data_;
}
+void LFModifier::test(float *test) {
+ for(int i = 0; i < 4; ++i) {
+ _mm_store_ps(&test[i], _mm_load_ps(&test[i+1]) + _mm_set1_ps(1.f));
+ }
+}
void LFModifier::correctDistortion(double &x, double &y, int cx, int cy, double scale) const
{
diff --git a/rtengine/rtlensfun.h b/rtengine/rtlensfun.h
index 7690ef54..ebb7b7f2 100644
--- a/rtengine/rtlensfun.h
+++ b/rtengine/rtlensfun.h
@@ -47,9 +47,9 @@ public:
void correctCA(double &x, double &y, int cx, int cy, int channel) const override;
void processVignetteLine(int width, int y, float *line) const override;
void processVignetteLine3Channels(int width, int y, float *line) const override;
+ static void test(float *test);
Glib::ustring getDisplayString() const;
-
private:
LFModifier(lfModifier *m, bool swap_xy, int flags);
And here's the same, but with unaligned access, which does not crash:
diff --git a/rtengine/rawimagesource.cc b/rtengine/rawimagesource.cc
index 0420c5b7..8dd30b05 100644
--- a/rtengine/rawimagesource.cc
+++ b/rtengine/rawimagesource.cc
@@ -1520,6 +1520,10 @@ void RawImageSource::vflip (Imagefloat* image)
int RawImageSource::load (const Glib::ustring &fname)
{
+float test[8] = { 1.f };
+LFModifier::test(test);
+std::cout << test[3] << std::endl;
+
MyTime t1, t2;
t1.set();
fileName = fname;
diff --git a/rtengine/rtlensfun.cc b/rtengine/rtlensfun.cc
index 8216f0b8..938ebd7a 100644
--- a/rtengine/rtlensfun.cc
+++ b/rtengine/rtlensfun.cc
@@ -43,6 +43,11 @@ LFModifier::operator bool() const
return data_;
}
+void LFModifier::test(float *test) {
+ for(int i = 0; i < 4; ++i) {
+ _mm_storeu_ps(&test[i], _mm_loadu_ps(&test[i+1]) + _mm_set1_ps(1.f));
+ }
+}
void LFModifier::correctDistortion(double &x, double &y, int cx, int cy, double scale) const
{
diff --git a/rtengine/rtlensfun.h b/rtengine/rtlensfun.h
index 7690ef54..ebb7b7f2 100644
--- a/rtengine/rtlensfun.h
+++ b/rtengine/rtlensfun.h
@@ -47,9 +47,9 @@ public:
void correctCA(double &x, double &y, int cx, int cy, int channel) const override;
void processVignetteLine(int width, int y, float *line) const override;
void processVignetteLine3Channels(int width, int y, float *line) const override;
+ static void test(float *test);
Glib::ustring getDisplayString() const;
-
private:
LFModifier(lfModifier *m, bool swap_xy, int flags);
To test, apply the patch to dev and load a raw file.
@heckflosse Segfaults! :tada: Second patch not yet tested. Edit: Second patch doesn't segfault, as expected.
@Llerr It seems that in your latest test my patch from https://github.com/Beep6581/RawTherapee/issues/4432#issuecomment-371478424 was not applied.
Your output shows
0x08680d41 in LUT<float>::LUT (this=0x94173f8) at /home/lex/programs/code-rawtherapee/rtgui/../rtengine/LUT.h:177
177 maxsv = ZEROV;
while with applied patch line 177 is:
_mm_storeu_ps((float*)&maxsv, F2V( maxs ));
@heckflosse But Ingo, there are a lot of other places, where unaligned loads and stores are used (not to mention ZEROV
assignments). Aren't they all relevant and not only LUT.h
? And why is it, that it doesn't crash in my VM now that we know it should? Are my GLIBC 2.27 and GCC 7.3 aligning on 16B even on i686? And if so, is it worth the trouble fixing all unaligned accesses when this is going to be mended with an up-to-date toolchain?
:confused:, Flössie
Seek and ye shall find: https://github.com/kraj/glibc/commit/4e61a6be446026c327aa70cef221c9082bf0085d#diff-f2db25b59456b2ed40b038a3935be358 (nota bene: the hash has no 42
in it as it should have :grin:)
HTH, Flössie
@Floessie maybe it's been truncated ;)
@heckflosse In the meantime I've set up a VM with Debian 9.4 i386. dev
segfaults immediately somewhere in MyCurve
as expected. As malloc()
is a weak symbol in GLIBC, I simply overwrote it, and that made RT start, open an image, and even save it.
Here's a patch as a starting point (must be #ifdef
ed like hell, of course):
diff --git a/rtgui/main.cc b/rtgui/main.cc
index 55d27125..b969ca8b 100644
--- a/rtgui/main.cc
+++ b/rtgui/main.cc
@@ -51,6 +51,13 @@
#include "conio.h"
#endif
+#include <malloc.h>
+
+void* malloc(size_t size)
+{
+ return memalign(16, size);
+}
+
// Set this to 1 to make RT work when started with Eclipse and arguments, at least on Windows platform
#define ECLIPSE_ARGS 0
Best, Flössie
Here's a full integration, which fixes the problem for i386 Linux. Successfully tested on Debian 9.4 i386, Debian 9.4 amd64, and Debian 10 i386. The malloc()
hack is only active on the first.
diff --git a/CMakeLists.txt b/CMakeLists.txt
index d57e1827..8dcf6793 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -103,6 +103,9 @@ if(HAVE_X86_SSE_MATH)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse2 -mfpmath=sse")
endif()
+# On i386 Linux we can fix unaligned SSE malloc (see GitHub issue #4432)
+include(FindUnalignedMalloc)
+
if(WIN32)
# Add additional paths. Look in the MinGW path first, then in the Gtkmm path.
# If you wish to build some dependent libraries, you have to install them in MinGW to use them:
diff --git a/cmake/modules/FindUnalignedMalloc.cmake b/cmake/modules/FindUnalignedMalloc.cmake
new file mode 100644
index 00000000..4ddfb2af
--- /dev/null
+++ b/cmake/modules/FindUnalignedMalloc.cmake
@@ -0,0 +1,49 @@
+# This file is part of RawTherapee.
+#
+# Copyright (C) 2018 Flössie <floessie.mail@gmail.com>
+#
+# RawTherapee is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# RawTherapee is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with RawTherapee. If not, see <http://www.gnu.org/licenses/>.
+
+include(CheckCXXSourceCompiles)
+
+set(CMAKE_REQUIRED_QUIET_COPY "${CMAKE_REQUIRED_QUIET}")
+set(CMAKE_REQUIRED_QUIET ON)
+
+set(TEST_SOURCE
+"
+#include <cstddef>
+#include <type_traits>
+
+int main()
+{
+#if defined(__SSE2__) && (defined(__i386) || defined(_M_IX86)) && defined(__linux)
+ static_assert(std::alignment_of<std::max_align_t>::value >= 16, \"Unaligned heap objects possible\");
+#endif
+
+ return 0;
+}
+")
+
+CHECK_CXX_SOURCE_COMPILES("${TEST_SOURCE}" HAVE_ALIGNED_MALLOC)
+
+if(NOT HAVE_ALIGNED_MALLOC)
+ set(HAVE_UNALIGNED_MALLOC 1)
+else()
+ unset(HAVE_ALIGNED_MALLOC)
+endif()
+
+unset(TEST_SOURCE)
+
+set(CMAKE_REQUIRED_QUIET "${CMAKE_REQUIRED_QUIET_COPY}")
+unset(CMAKE_REQUIRED_QUIET_COPY)
diff --git a/rtgui/CMakeLists.txt b/rtgui/CMakeLists.txt
index 540b4d26..c39ed004 100644
--- a/rtgui/CMakeLists.txt
+++ b/rtgui/CMakeLists.txt
@@ -1,5 +1,6 @@
# Common source files for both CLI and non-CLI execautables
set(CLISOURCEFILES
+ alignedmalloc.cc
edit.cc
main-cli.cc
multilangmgr.cc
@@ -11,6 +12,7 @@ set(CLISOURCEFILES
set(NONCLISOURCEFILES
adjuster.cc
+ alignedmalloc.cc
batchqueue.cc
batchqueuebuttonset.cc
batchqueueentry.cc
diff --git a/rtgui/alignedmalloc.cc b/rtgui/alignedmalloc.cc
new file mode 100644
index 00000000..91fe15a8
--- /dev/null
+++ b/rtgui/alignedmalloc.cc
@@ -0,0 +1,31 @@
+/*
+ * This file is part of RawTherapee.
+ *
+ * Copyright (C) 2018 Flössie <floessie.mail@gmail.com>
+ *
+ * RawTherapee is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * RawTherapee is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with RawTherapee. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "config.h"
+
+#ifdef HAVE_UNALIGNED_MALLOC
+
+#include <malloc.h>
+
+void* malloc(size_t size)
+{
+ return memalign(16, size);
+}
+
+#endif
diff --git a/rtgui/config.h.in b/rtgui/config.h.in
index fdf64b73..2d1f41db 100644
--- a/rtgui/config.h.in
+++ b/rtgui/config.h.in
@@ -21,6 +21,7 @@
#define __CONFIG_H__
#cmakedefine BUILD_BUNDLE
+#cmakedefine HAVE_UNALIGNED_MALLOC
#define DATA_SEARCH_PATH "${DATADIR}"
#define DOC_SEARCH_PATH "${DOCDIR}"
#define CREDITS_SEARCH_PATH "${CREDITSDIR}"
@heckflosse @Beep6581 @innir Please test, thanks.
HTH, Flössie
Thanks @Floessie, builds and works fine on Win7/64
Built and ran fine in Sabayon gcc (Gentoo Hardened 6.4.0-r1 p1.3) 6.4.0 x86_64 Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz
Tested with Clang 3.9.1 on Debian 9.4 i386. :heavy_check_mark:
Tested with Clang 5.0.1 on Debian 10 i386:
With patch: :heavy_check_mark: (HAVE_UNALIGNED_MALLOC
is defined)
Without patch: :heavy_check_mark:
Clang 5 needs -latomic
on the CMAKE_EXE_LINKER_FLAGS
.
Shall I push to dev
?
@Floessie yes, please :)
Is this for 5.5?
imho for 5.4
I'm about to push...
i'm test. it's work. $git describe 5.4-rc3
system: ubuntu 16.04 LTS i686