darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.8k stars 1.14k forks source link

Tuned opencl leads to garbled exports (on AMD Polaris) #11253

Closed piratenpanda closed 2 years ago

piratenpanda commented 2 years ago

I just tried using the new opencl tuner. When working on images it looks fine, but when exporting images where I used color calibration (edit, stupid me) to get black and white images I get different artifacts on each export: 151A7984_02 151A7984_03 151A7984_05

This does not happen when turning the tuning off and restarting darktable.

Also after a few exports, darktable refuses to export any images, I have to cancel the export job and when rerunning export I get back my my old bug:

#0  0x00007fff7c0ad0c2 in PermutohedralLattice<5, 4>::splat(float*, float*, unsigned long, int) const
    (this=0x7fffc77d3970, position=position@entry=0x7fff4afeddc0, value=value@entry=0x7fff4afeddb0, replay_index=replay_index@entry=22195893, thread_index=thread_index@entry=3) at /home/panda/Downloads/dtcompile/darktable/src/iop/Permutohedral.h:485
        i = 0
        elevated = {4.8620141e+21, 4.8620141e+21, 4.8620141e+21, 4.8620141e+21, -1.94480609e+22, -1.17909422e-05}
        greedy = {-2147483642, -2147483642, -2147483642, -2147483642, -2147483642, 6}
        rank = {-357913935, -357913934, -357913933, -357913932, -357913930, -357913931}
        barycentric = {0, 0, 0, 0, 0, 0, 0}
        key = {hash = 0, key = {0, 0, 0, 0, 0}}
        sum = -357913943
#1  0x00007fff7c0ab98a in process._omp_fn.1(void) () at /home/panda/Downloads/dtcompile/darktable/src/iop/bilateral.cc:226
        pos = {182.609665, 128.350418, 2.36697328e-09, 4.43839229e+21, 2.63653465e-06}
        val = {4.73394546e-10, 8.87678275e+20, 1.31826727e-08, 1}
        i = 4917
        in = 0x7fff0f1cdb90
        thread = 3
        index = 22195893
        j = 1258216896
        ivoid = <optimized out>
        roi_in = 0x7fffc77d3f20
        ch = <optimized out>
        sigma = {0.0371384323, 0.0371384323, 5.00000095, 5.00000095, 200}
        lattice = 
          {nData = 27501143, nThreads = 4, scaleFactor = 0x7fffb818caa0, canonical = 0x7fffb812f910, replay = 0x7ffe7836b010, hashTables = 0x7fffb8083008}
        lattice = 
          {nData = -9841740, nThreads = -9776596, scaleFactor = 0xff6c534fff6cd216, canonical = 0xff6bd1a1ff6ad32b, replay = 0xff6d534fff6cd117, hashTables = 0xff6bd2a1ff69d3b4}
        data = <optimized out>
        ch = <optimized out>
        sigma = {-2.97513945e+38, -3.02824853e+38, -3.01498424e+38, -3.04799731e+38, -3.02819621e+38}
        rad = <optimized out>
#2  0x00007ffff7359926 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/libgomp/team.c:125
        team = 0x7fffb807aab0
        task = 0x7fffb807b278
        data = <optimized out>
        pool = 0x7fffb80740e0
        local_fn = 0x7fff7c0ab7f0 <process._omp_fn.1(void)>
        local_data = 0x7fffc77d39a0
#3  0x00007ffff1f655c2 in start_thread () at /usr/lib/libc.so.6
#4  0x00007ffff1fea584 in clone () at /usr/lib/libc.so.6

@jenshannoschwalm are there some values which are not limited to 0 maybe?

jenshannoschwalm commented 2 years ago

Difficult to say what the precise reason is atm from here.

The main difference between standard modes and "tune for cl" is the check for available memory.

  1. The current check might be a problem itself for your driver
  2. the cl memory found as free is estimated slightly to high, on nvidia this would lead to an error condition falling back to cpu, so should amd drivers also do.

Please report a logfile with -d memory -d opencl and give me a short reminder of your system. (I can't find it right now)

Your reporting again about the "old bug", this is really interesting, is there a buffer mem leak we are not aware of so far?

piratenpanda commented 2 years ago

My system is a i5 7600K, 32 GB RAM, RX 580 with 8GB memory

This log is a single export with garbled output. darktable_log.txt

One thing I noticed in the log is that this exceeds the memory of my graphics card: 24,128368 [opencl memory] device 0: 9409267200 bytes (8973,4 MB) in use

And this is exporting a color image afterwards (starting in line 1484): darktable_log2.txt

Your reporting again about the "old bug", this is really interesting, is there a buffer mem leak we are not aware of so far?

Never had a single issue after your recent RCD bug fixes. This just reappeared with the tuner it seems

piratenpanda commented 2 years ago

Ok now I am confused. It now says tuning off in the log and has the same error. Will do more checks.

Edit: tuning off, bw export works fine, click on tuning, export is garbled. Issue seems to persist. Is the tuner instant as soon as I click it? I suppose tuned values persist for the time darktable runs even if I turn tuning off?

Here is a log with tuning on from the beginning: darktable_log3.txt

27,030909 [dt_opencl_get_unused_device_mem] 13491MB available, 17592186039116MB of 8192MB on device 0 already used
27,030936 [dt_opencl_get_device_available] use 13363MB (tune=ON) as available on device 0

also doesn't seem right to me.

Now also the darkroom seems to be affected: Bildschirmfoto von 2022-03-02 21-47-10

jenshannoschwalm commented 2 years ago

Is the tuner instant as soon as I click it?

Principally yes but the calculation is done when dt check for available memory in the pipeline the first time.

I read through your logs and

  1. device 1 `Ellesmere' allows GPU memory allocations of up to 6745MB --> will check if this really correct.
  2. [dt_opencl_get_unused_device_mem] 13491MB available, 17592186039116MB of 8192MB on device 0 already used --> this is definitely wrong. Maybe this being double of value mentioned in 1. hints to a stupid code bug.
  3. it might also be that the drivers backfire because of non-initialized memory when testing - as we had in the fixed bugs.
jenshannoschwalm commented 2 years ago

@piratenpanda i think i may have found the culprit, could you try after replacing these two functions in common/opencl.c please?

patch.c.txt

piratenpanda commented 2 years ago

Now it reads:

24,462150 [dt_opencl_get_unused_device_mem] 8153MB available, 38MB of 8192MB on device 0 already used
24,462175 [dt_opencl_get_device_available] use 8025MB (tune=ON) as available on device 0

and exported images look fine. Used graphics memory now also never exceeds the maximum physically available size.

jenshannoschwalm commented 2 years ago

@piratenpanda thanks a lot for reporting and for a quick confirm of the fix :-)