ROCm / clr

MIT License
104 stars 50 forks source link

Fix gfx8 opencl #97

Open Mystro256 opened 1 month ago

Mystro256 commented 1 month ago

This condition was added when we supported PAL openCL on gfx8, but when ROC_ENABLE_PRE_VEGA was dropped and PAL OpenCL on Linux was deprecated, this logic should have been dropped completely.

Germano0 commented 1 month ago

Good day, I am a gfx803 owner, I am already testing this patch on rocm 6.2.1 on Fedora 40 and darktable 4.8.1.

# rocm-clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3625.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback 

  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    AMD Radeon RX 480 Graphics
  Device Topology:                               PCI[ B#9, D#0, F#0 ]
  Max compute units:                             36
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1288Mhz
  Address bits:                                  64
  Max memory allocation:                         7301444400
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            16384
  Max image 3D height:                           16384
  Max image 3D depth:                            8192
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            8589934592
  Constant buffer size:                          7301444400
  Max number of constant args:                   8
  Local memory type:                             Local
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          3006477104
  Max global variable size:                      7301444400
  Max global variable preferred total size:      8589934592
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x7fbbd311c7e8
  Name:                                          gfx803
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                3625.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 
$ darktable-cltest
darktable 4.8.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.3.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0.0235 [dt_get_sysresource_level] switched to 2 as `large'
     0.0235   total mem:       128726MB
     0.0235   mipmap cache:    16090MB
     0.0235   available mem:   87996MB
     0.0235   singlebuff:      2011MB
     0.0367 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
     0.1006 [opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx803'
   CONF KEY:                 cldevice_v5_amdacceleratedparallelprocessinggfx803
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx803
   DRIVER VERSION:           3625.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 1.2 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          8192 MB
   MAX MEM ALLOC:            6963 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/user/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx803_36250HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0180 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init]           0       'AMD Accelerated Parallel Processing gfx803'
     0.4072 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200

I would like to help you with the tests, is there anything I can do? Best regards

Mystro256 commented 1 month ago

@Germano0 gfx8 isn't really supported by the ROCm team anymore, but there's no harm in enabling it in my opinion. This patch just fixes the logic to enabling it as I don't see why not. This way if someone with a gfx8 wants to fix issues with it, we can at least remove that barrier.