darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.81k stars 1.14k forks source link

Glitched images in darkroom due to OpenCL with AMD GPU #16378

Open tomaz-suller opened 8 months ago

tomaz-suller commented 8 months ago

Describe the bug

Glitches show up in the bottom right hand corner when images are opened using darkroom with OpenCL no AMD GPU, just as reported in #15589 (which is why I'm skipping some details, including screencast).

As requested in the original issue, I'm attaching the output of running darktable with darktable -d pipe -d opencl .. The steps I followed after opening the app to generate the log were:

  1. Select an import job in lighttable
  2. Double-click on an image from the import job to open it in darkroom
  3. View the image in darkroom

Immediately after opening the image, the glitch in the bottom right hand corner appears, as is visible in the following screenshot:

Image in darkroom with glitch on right hand corner

Important to note that before I installed OpenCL, I didn't have any similar issue. Also, opening images on separate viewers showed no glitch, which made me exclude the possibility of the images themselves being corrupted.

Steps to reproduce

  1. Open lighttable
  2. Select any NEF image
  3. Open NEF image on darkroom

Image glitch immediately shows up on the bottom right corner of the image

Expected behavior

Show the picture with no glitches

Logfile | Screenshot | Screencast

Log after following steps to reproduce with

darktable -d pipe -d opencl .

Some information about my installation from pacman:

❯ paru -Qi darktable
Name            : darktable
Version         : 2:4.6.1-1
Description     : Utility to organize and develop raw images
Architecture    : x86_64
URL             : https://darktable.org
Licenses        : GPL3
Groups          : None
Provides        : None
Depends On      : pugixml  libjpeg-turbo  colord-gtk  libgphoto2  openexr  lensfun  iso-codes  zlib
                  exiv2  flickcurl  openjpeg2  graphicsmagick  lua  osm-gps-map  libsecret  openmp  gmic
                  libavif  jasper  libjxl
Optional Deps   : dcraw: base curve script
                  perl-image-exiftool: base curve script
                  imagemagick: base curve and noise profile scripts [installed]
                  ghostscript: noise profile script [installed]
                  portmidi: game and midi controller input devices [installed]
                  gnuplot: noise profile script
Required By     : None
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 30,92 MiB
Packager        : Caleb Maclennan <alerque@archlinux.org>
Build Date      : sab 17 feb 2024, 16:19:51
Install Date    : dom 25 feb 2024, 09:00:49
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

Commit

No response

Where did you obtain darktable from?

distro packaging

darktable version

4.6.1

What OS are you using?

Linux

What is the version of your OS?

EndeavourOS Linux x86_64, Kernel 6.7.6-arch1-1 (all packages up to date)

Describe your system?

No response

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

Radeon 780M 1 GiB (integrated graphics with AMD Ryzen 7 PRO 7840U), mesa 24.0.1, rocm-opencl-runtime 6.0.0

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

tomaz-suller commented 8 months ago

Forgot to add that the log resulted from me not only opening the picture, but also zooming it in until the glitch disappeared at 62% zoom.

tomaz-suller commented 8 months ago

To be entirely honest I'm not sure, and I'm not sure about how to check either; if you give any instructions I can follow them.

What I do know, as the logs and my GPU usage according to nvtop show, is that the GPU is detected and that darktable is using it.

tomaz-suller commented 8 months ago

Seems to be precisely what is going on here. The drivers load and the device is detected both by darktable and clinfo, which is not the case with the Rusticl (since apparently it doesn't provide support for APU at all).

jenshannoschwalm commented 8 months ago

@tomaz-suller would you be able to check with current master? I think the issue should be gone right now, if not please prove a fresh log with -d opencl -d pipe to investigate further.

tomaz-suller commented 8 months ago

Just tested, same behaviour still. Just to be sure I didn't mess up during the installation, here's what I installed:

darktable d496f37
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.3.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

And here are the logs.

tomaz-suller commented 8 months ago

The output of the first command is just /opt/darktable-test/bin/darktable --version.

To produce the logs I ran darktable with /opt/darktable-test/bin/darktable --configdir "~/.config/darktable-test" -d pipe -d opencl, imported 3 NEF files, opened two of them and zoomed in and out.

jenshannoschwalm commented 8 months ago

The device offers only 1gb of ram so there is a huge amount of tiling. Difficult to track down from here.

  1. Can you somehow control the size of dedicated ram, maybe via bios settings?

  2. Could you check with resources=small settings? Also with logs as above ?

  3. Are there any modules you can switch off and the issue goes away?

  4. Can you please confirm that issue goes away while zooming in?

tomaz-suller commented 8 months ago
  1. Didn't understand what you mean. I've never tried controlling it, but for sure I can't increase it since I'm running darktable on my laptop, which is the only computer I have, if that's what you're asking; frankly I don't know if it's possible to reduce it.

  2. Still same problem. Logs are here. Just to be 100% sure, this is what you mean by

    resources=small

    right? Screenshot_20240320_114300

  3. I'm a beginner in darktable, and I have the default install from master, so I'm a bit clueless about what the "modules" would be. How could I go about disabling them?

  4. Yes, the issue goes away when zooming in, at roughly the same level as before (around 38%)

garrett commented 8 months ago

In previous versions of darktable, it seemed to pick the correct device with ROCm.

At some point over the past several, either ROCm and/or darktable changed and I'd also see these glitches.

I worked around this issue by adding this snippet to /etc/environment (and logged out and back in) and now darktable works very well again with my 7900 XTX:

HSA_OVERRIDE_GFX_VERSION=11.0.0

(It would be different for different video cards; this is for the 7000 series. For example, I think HSA_OVERRIDE_GFX_VERSION=10.3.0 would be needed for the 6000 series.)

Previously, setting this was unnecessary and darktable picked the correct GPU. In other words, I'm not suggesting this as a solution, only as a temporary workaround and perhaps as a hint as to what the problem might be.

I did disable the iGPU on my AMD Ryzen 9 7950X3D, and rocminfo still shows the CPU as well as my discrete GPU. So I'm guessing that darktable isn't picking the correct GPU or trying to use them all.

After disabling the iGPU, I see this:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7950X3D 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X3D 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5759                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65467276(0x3e6f38c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-16f2a3584821508f               
  Marketing Name:          AMD Radeon RX 7900 XTX             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2371                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 550                                
  SDMA engine uCode::      19                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

Note the 2 "agents" for ROCm; first is CPU, second is GPU. If I re-enable my iGPU, I'd probably have 3 listed.

jenshannoschwalm commented 8 months ago

@tomaz-suller what you describe is pinpointing to a mem alloc problem. Not sure yet if it's a rusticl bug or dt doing something wrong like overallocating memory. Running dt using the small preference points to a rusticl problem as more likely.

tomaz-suller commented 8 months ago

Got it. Just to clarify, I'm using ROCm since rusticl doesn't support my iGPU, but the message is the same of course.

jenshannoschwalm commented 8 months ago

@tomaz-suller the question about modules was about "what dt modules are you yousing? At the right side in darkroom. Could you check with demosaic set to lmmse and check for issue? Could you disable highlights and check? This would be the test on your side where the issue might becoming from. The log shows: opencl is running fine and didn't report an issue...

jenshannoschwalm commented 8 months ago

OK, the AMD driver is notorious for problems :-) Maaybe you can identify the bad module and we can fix it in dt. :-) it might be worth to test if opencl on such a small device is helping for performance at all...

pomoke commented 7 months ago

I have this problem as well with integrated Vega 7, with lossy ARWs. And if interpolator is changed, the stripes will have different pattern.

Darktable: 4.6.1 Log from darktable -d pipe -d opencl:

darktable 4.6.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.2.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0.2291 [dt_get_sysresource_level] switched to 2 as `large'
     0.2291   total mem:       29803MB
     0.2291   mipmap cache:    3725MB
     0.2292   available mem:   20373MB
     0.2292   singlebuff:      465MB
     0.2573 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
     0.4118 [opencl_init] found 3 platforms
     0.4119 [check platform] platform 'rusticl' with key 'clplatform_rusticl' is NOT active
     0.4257 [check platform] platform 'Portable Computing Language' with key 'clplatform_portablecomputinglanguage' is NOT active
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx900:xnack-'
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx900xnack
   DRIVER VERSION:           3590.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          2048 MB
   MAX MEM ALLOC:            1741 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   USE HEADROOM:             400Mb
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/<redacted>/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx900xnack_35900HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0634 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init]       0   'AMD Accelerated Parallel Processing gfx900:xnack-'
     0.9185 [opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]       image   preview export  thumbs  preview2
[dt_opencl_update_priorities]       0   -1  0   0   -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]       image   preview export  thumbs  preview2
[dt_opencl_update_priorities]       0   0   0   0   0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]       image   preview export  thumbs  preview2
[dt_opencl_update_priorities]       0   -1  0   0   -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]       image   preview export  thumbs  preview2
[dt_opencl_update_priorities]       0   0   0   0   0
[opencl_synchronization_timeout] synchronization timeout set to 200

This is how it looks: image

With LMMSE demosaic, the fit view works.

image

da-phil commented 6 months ago

Got it. Just to clarify, I'm using ROCm since rusticl doesn't support my iGPU, but the message is the same of course.

@tomaz-suller Did you try out what @garrett proposed in order to make ROCm recognize your iGPU as a supported GPU, setting the HSA_OVERRIDE_GFX_VERSION environment variable?

Another guess: does the "classic" OpenCL driver work for iGPUs? E.g. installing it by sudo amdgpu-install --usecase=graphics,opencl --opencl=rocr

I'm interested in this issue as I'm currently planning to buy a laptop (TUXEDO Pulse 14 Gen 4) featuring a Radeon 780M iGPU.

In this forum they claimed that ROCm should work for this iGPU:

Note that we use HSA_OVERRIDE_GFX_VERSION=11.0.0 because the 780m iGPU is gfx1103 (version 11.0.3) which ROCm does not support, but in my experience using the override to tell ROCm to pretend it is gfx1100 seems to work without issue.

github-actions[bot] commented 4 months ago

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

da-phil commented 3 months ago

@tomaz-suller did you figure out the issue in the meantime? maybe a new driver release?

pomoke commented 3 months ago

A rusticl driver seems to work properly. Try your luck if you are on Linux.

denis-martin commented 2 months ago

I found one way to get rid of the stripes when using AMD ROCm (at least it worked for me): https://github.com/darktable-org/darktable/issues/17210#issuecomment-2343126616 (summary: enforce pinned memory).

Anybody with the same issue, can you test if this works for you as well?

Germano0 commented 1 month ago

Cannot reproduce on