Open Mystro256 opened 1 month ago
Good day, I am a gfx803
owner, I am already testing this patch on rocm 6.2.1 on Fedora 40 and darktable 4.8.1.
# rocm-clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3625.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon RX 480 Graphics
Device Topology: PCI[ B#9, D#0, F#0 ]
Max compute units: 36
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1288Mhz
Address bits: 64
Max memory allocation: 7301444400
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 8192
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8589934592
Constant buffer size: 7301444400
Max number of constant args: 8
Local memory type: Local
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 3006477104
Max global variable size: 7301444400
Max global variable preferred total size: 8589934592
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7fbbd311c7e8
Name: gfx803
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3625.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 1.2
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
$ darktable-cltest
darktable 4.8.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.
Compile options:
Bit depth -> 64 bit
Debug -> DISABLED
SSE2 optimizations -> ENABLED
OpenMP -> ENABLED
OpenCL -> ENABLED
Lua -> ENABLED - API version 9.3.0
Colord -> ENABLED
gPhoto2 -> ENABLED
GMIC -> ENABLED - Compressed LUTs are supported
GraphicsMagick -> ENABLED
ImageMagick -> DISABLED
libavif -> ENABLED
libheif -> ENABLED
libjxl -> ENABLED
OpenJPEG -> ENABLED
OpenEXR -> ENABLED
WebP -> ENABLED
See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.
0.0235 [dt_get_sysresource_level] switched to 2 as `large'
0.0235 total mem: 128726MB
0.0235 mipmap cache: 16090MB
0.0235 available mem: 87996MB
0.0235 singlebuff: 2011MB
0.0367 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
0.1006 [opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'gfx803'
CONF KEY: cldevice_v5_amdacceleratedparallelprocessinggfx803
PLATFORM, VENDOR & ID: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
CANONICAL NAME: amdacceleratedparallelprocessinggfx803
DRIVER VERSION: 3625.0 (HSA1.1,LC)
DEVICE VERSION: OpenCL 1.2
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 8192 MB
MAX MEM ALLOC: 6963 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/user/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx803_36250HSA11LC
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DAMD=1 -I"/usr/share/darktable/kernels"
KERNEL LOADING TIME: 0.0180 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 'AMD Accelerated Parallel Processing gfx803'
0.4072 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
I would like to help you with the tests, is there anything I can do? Best regards
@Germano0 gfx8 isn't really supported by the ROCm team anymore, but there's no harm in enabling it in my opinion. This patch just fixes the logic to enabling it as I don't see why not. This way if someone with a gfx8 wants to fix issues with it, we can at least remove that barrier.
This condition was added when we supported PAL openCL on gfx8, but when ROC_ENABLE_PRE_VEGA was dropped and PAL OpenCL on Linux was deprecated, this logic should have been dropped completely.