Closed robMate closed 11 months ago
You said since you switched from NVidia to AMD, but this also reproduce without OpenCL. So it rules out an OpenCL issue. Yet, I cannot reproduce and I'm sure it works fine for many others as this is quite a common action (adding a node in RGB Curve)... So there is probably an issue coming from something else, we're missing part of the context here I think.
Even with your RAW + xmp I cannot reproduce the crash.
hm its the same for me. I tested it on my laptop and everything works. I compared the build logs of both devices but could not find a meaningful difference like a missing library. Both devices run on manjaro. Currently, I'm looking in gdb debugging to see if i can provide some more information.
I found out that it dose not crash if i compile as a debug build like this
./build.sh --prefix /opt/darktable --build-type Debug --install --sudo
I tried a Release build after this, but it crashed on the same line as usual?
...
[New Thread 0x7ffde7aa46c0 (LWP 56082)]
[New Thread 0x7ffde72a36c0 (LWP 56083)]
Thread 6 "worker 0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdeffd6c0 (LWP 55948)]
0x00007fffbc52f354 in _generate_curve_lut (pipe=<optimized out>, d=d@entry=0x7ffea4f3f010) at /home/robin/git/darktable/src/iop/rgbcurve.c:1565
1565 (void)dt_draw_curve_add_point(d->curve[ch], curve_nodes[ch][k].x, curve_nodes[ch][k].y);
(gdb) info locals
k = <optimized out>
ch = 0
work_profile = <optimized out>
curve_nodes = {{{x = 0, y = 0}, {x = 0.105990782, y = 0.10599079}, {x = 0.235023037, y = 0.235023037}, {x = 0.374172181, y = 0.374172181}, {x = 1, y = 1}, {x = 0, y = 0} <repeats 15 times>}, {{x = 0, y = 0}, {x = 1, y = 1}, {x = 0, y = 0} <repeats 18 times>}, {{x = 0, y = 0}, {
x = 1, y = 1}, {x = 0, y = 0} <repeats 18 times>}}
(gdb)
``
gcc 13.2.1 20230801 this is the same as on my laptop, where it works.
I added the build output because it contains some more versions it uses on my system.
Here is also a backtrace file of the crash: darktable_bt_8832B2.txt
I not shure why it works on a debug build but not on a release build. Has it something to do with the optimization for the AMD CPU? ( Wild guess, have no idea what happens there)
I tested it, but it works there without any issues. I also tested the base curve, works. On the rgb curve i also edited the channels separate and as soon as i add two points on one of the channels darktable crashes.
I added a screencast of the issue on discuss.pixls.us because i could not upload it here https://discuss.pixls.us/t/rgb-curve-module-crashes-darktable-on-second-curve-controll-point/39729/14?u=fireball
I tested the flatpak 4.4.2, no crash. I build the 4.4.2 local, it does not work. So to me it looks like one of my dependencies/ shared libraries is broken?
I learned a bit more gdb and found a solution 🥳 It looks like the for loop variable gets optimized out.
If i put a volatile on the k variable in the rgbcurve.c file it works and i can see the variable in gdb.
It does not happen on my laptop and by the looks of things to nobody but me 😅
Is this a real solution, or did i create something wild? I'm not a c developer but it works for now.
Ok i narrowed it down to a flag contained in the -O3 optimization group: -fvect-cost-model=dynamic When i compile it without my fix and with the optimization -O2 plus all optimization from O3 without the -fvect-cost-model=dynamic the rgb curve module is stable.
My cpu should be in -march=znver4 but switching to znver3 fixes the bug as well.
Im not sure if something can come from this information, but this is what i found so far.
So looks like a vectorization issue, @ralfbrown may have an idea?
At least the module data doesn't seem to be aligned: https://github.com/darktable-org/darktable/blob/ddc2a5ba2eadbc7b9475a3f760329df0a7604eda/src/iop/rgbcurve.c#L1460 Might be necessary to alloc it aligned? At least it would be a good first guess.
volatile
eliminates most optimizations involving the variable, since it requires that every read in the code emit exactly one read of main memory and every write generate exactly one write to memory. This doesn't (at first glance) appear to be an alignment issue, so I'd be surprised if aligning the memory allocation will fix it.
At this point, we can't yet rule out a compiler bug, since this seems to happen only when optimizing for the newest architecture.
Tried on my side with Release mode (so -O3
) with GCC 13.2
and no crash. My arch is set to -march=native
(not sure which one is selected, I'm on x86_64 i9-9980HK).
Confirmed fixed by #15742.
Describe the bug
I recently switched from an Nvidia GPU to an AMD one. Now the rgb curve module causes darktable to crash after i add a second curve control point to the curve.
The crash is reproducible on Wayland & Xorg and with and without opencl.
opencl package: opencl-amd 5.7 (rocm-5.7.0) OS: Manjaro dt build: 4.5.0+786~g1439bb0eb6 Image: (could not upload a example file but the issue appeared on RAF & ARW files. I think its not dependent on the file type) darktable debug output: darktable_debug_all.txt
inxi -Gazy
gdb darktable ( full log dartable_gdb_output.txt ):
Let me know if i can help with additional information.
Steps to reproduce
Expected behavior
should not crash
Logfile | Screenshot | Screencast
No response
Commit
No response
Where did you install darktable from?
self compiled
darktable version
4.5.0+786~g1439bb0eb6
What OS are you using?
Linux
What is the version of your OS?
Manjaro
Describe your system?
RAM: 32 GB CPU: AMD Ryzen™ 9 7900X3D, 12 Core, 24 Threads GPU: AMD Radeon™ RX 7900 XTX Display-Server: Wayland GTK: 2.24.38-1
Are you using OpenCL GPU in darktable?
Yes
If yes, what is the GPU card and driver?
AMD Radeon™ RX 7900 XTX / amdgpu
Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip
No response