darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.14k stars 1.11k forks source link

OpenCL on ARM (not Apple Silicon) #15067

Closed sunarowicz closed 7 months ago

sunarowicz commented 11 months ago

Darktable already runs on ARM (not Apple Silicon only). That's really great, many thanks to all who contributed to this. But when I try to run it on the Radxa Rock Pi 5B SBC with RK3588 chipset, it runs without the OpenCL support only, although the OpenCL is available in the system, see the clinfo output below. As darktable runs with OpenCL support on Apple Silicon, I hope it should not be the mission impossible to make it possible for other ARM devices too. Especially if it fails "only" because of invalid CL build options, see the darktable-cltest output below.

Please make OpenCL available for non Apple Silicon ARM devices too. I'll be happy to provide any necessary testing if needed.

Thank you for considering this.

clinfo output:

tux@rock5b ~> clinfo
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_khr_subgroup_rotate cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_khr_suggested_local_work_size cl_khr_extended_bit_ops cl_khr_integer_dot_product cl_khr_semaphore cl_khr_external_semaphore cl_khr_external_semaphore_sync_fd cl_khr_command_buffer cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl cl_ext_image_tiling_control cl_ext_image_requirements_info cl_ext_image_from_buffer
  Platform Extensions with Version                cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_egl_image                                                 0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_rotate                                           0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_khr_extended_versioning                                       0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_semaphore                                                   0x9000 (0.9.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_sync_fd                                  0x9000 (0.9.0)
                                                  cl_khr_command_buffer                                              0x9000 (0.9.0)
                                                  cl_arm_core_id                                                   0x400000 (1.0.0)
                                                  cl_arm_printf                                                    0x400000 (1.0.0)
                                                  cl_arm_non_uniform_work_group_size                               0x400000 (1.0.0)
                                                  cl_arm_import_memory                                             0x400000 (1.0.0)
                                                  cl_arm_import_memory_dma_buf                                     0x400000 (1.0.0)
                                                  cl_arm_import_memory_host                                        0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_int8                                  0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_accumulate_int8                       0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_accumulate_saturate_int8              0x400000 (1.0.0)
                                                  cl_arm_scheduling_controls                                         0x4000 (0.4.0)
                                                  cl_arm_controlled_kernel_termination                             0x400000 (1.0.0)
                                                  cl_ext_cxx_for_opencl                                            0x400000 (1.0.0)
                                                  cl_ext_image_tiling_control                                        0x1000 (0.1.0)
                                                  cl_ext_image_requirements_info                                     0x5000 (0.5.0)
                                                  cl_ext_image_from_buffer                                         0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             ARM
  Platform Host timer resolution                  1ns

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G610 r0p0
  Device Vendor                                   ARM
  Device Vendor ID                                0xa8670000
  Device Version                                  OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
  Device UUID                                     000067a8-0100-0000-0000-000000000000
  Driver UUID                                     13833dc2-ecef-4e5b-0159-38fdaf75bfde
  Valid Device LUID                               No
  Device LUID                                     0000-000000000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  3.0
  Device OpenCL C Version                         OpenCL C 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0x800000 (2.0.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_images                                                0x400000 (1.0.0)
                                                  __opencl_c_int64                                                 0x400000 (1.0.0)
                                                  __opencl_c_3d_image_writes                                       0x402000 (1.2.0)
                                                  __opencl_c_atomic_order_acq_rel                                  0x800000 (2.0.0)
                                                  __opencl_c_atomic_order_seq_cst                                  0x800000 (2.0.0)
                                                  __opencl_c_atomic_scope_device                                   0x800000 (2.0.0)
                                                  __opencl_c_atomic_scope_all_devices                              0x800000 (2.0.0)
                                                  __opencl_c_device_enqueue                                        0x800000 (2.0.0)
                                                  __opencl_c_generic_address_space                                 0x800000 (2.0.0)
                                                  __opencl_c_pipes                                                 0x800000 (2.0.0)
                                                  __opencl_c_program_scope_global_variables                        0x800000 (2.0.0)
                                                  __opencl_c_read_write_images                                     0x800000 (2.0.0)
                                                  __opencl_c_subgroups                                             0x800000 (2.0.0)
                                                  __opencl_c_work_group_collective_functions                       0x800000 (2.0.0)
  Device C++ for OpenCL Numeric Version           0x400000 (1.0.0)
  Latest comfornace test passed                   v2021-03-05-00
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               4
  Available core IDs                              0, 2, 16, 18
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (device)     16
  Preferred work group size multiple (kernel)     16
  Max sub-groups per work group                   64
  Preferred / native vector sizes                 
    char                                                16 / 4       
    short                                                8 / 2       
    int                                                  4 / 1       
    long                                                 2 / 1       
    half                                                 8 / 2        (cl_khr_fp16)
    float                                                4 / 1       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              16735506432 (15.59GiB)
  Error Correction support                        No
  Max memory allocation                           16735506432 (15.59GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Atomic fence capabilities                       relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        1048576 (1024KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   32 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             65536x65536 pixels
    Max 3D image size                             65536x65536x65536 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Pipe support                                    Yes
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     128
  Max constant buffer size                        16735506432 (15.59GiB)
  Generic address space support                   Yes
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     supported, replaceable default queue
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                2097152 (2MiB)
    Max size                                      16777216 (16MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       Yes
    Work-group collective functions               Yes
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.0
    ILs with version                              SPIR-V                                                           0x400000 (1.0.0)
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_khr_subgroup_rotate cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_khr_suggested_local_work_size cl_khr_extended_bit_ops cl_khr_integer_dot_product cl_khr_semaphore cl_khr_external_semaphore cl_khr_external_semaphore_sync_fd cl_khr_command_buffer cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl cl_ext_image_tiling_control cl_ext_image_requirements_info cl_ext_image_from_buffer
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_fp16                                                      0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_egl_image                                                 0x400000 (1.0.0)
                                                  cl_khr_image2d_from_buffer                                       0x400000 (1.0.0)
                                                  cl_khr_depth_images                                              0x400000 (1.0.0)
                                                  cl_khr_subgroups                                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_extended_types                                   0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_vote                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_ballot                                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_non_uniform_arithmetic                           0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle                                          0x400000 (1.0.0)
                                                  cl_khr_subgroup_shuffle_relative                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_clustered_reduce                                 0x400000 (1.0.0)
                                                  cl_khr_subgroup_rotate                                           0x400000 (1.0.0)
                                                  cl_khr_il_program                                                0x400000 (1.0.0)
                                                  cl_khr_priority_hints                                            0x400000 (1.0.0)
                                                  cl_khr_create_command_queue                                      0x400000 (1.0.0)
                                                  cl_khr_spirv_no_integer_wrap_decoration                          0x400000 (1.0.0)
                                                  cl_khr_extended_versioning                                       0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_suggested_local_work_size                                 0x400000 (1.0.0)
                                                  cl_khr_extended_bit_ops                                          0x400000 (1.0.0)
                                                  cl_khr_integer_dot_product                                       0x800000 (2.0.0)
                                                  cl_khr_semaphore                                                   0x9000 (0.9.0)
                                                  cl_khr_external_semaphore                                          0x9000 (0.9.0)
                                                  cl_khr_external_semaphore_sync_fd                                  0x9000 (0.9.0)
                                                  cl_khr_command_buffer                                              0x9000 (0.9.0)
                                                  cl_arm_core_id                                                   0x400000 (1.0.0)
                                                  cl_arm_printf                                                    0x400000 (1.0.0)
                                                  cl_arm_non_uniform_work_group_size                               0x400000 (1.0.0)
                                                  cl_arm_import_memory                                             0x400000 (1.0.0)
                                                  cl_arm_import_memory_dma_buf                                     0x400000 (1.0.0)
                                                  cl_arm_import_memory_host                                        0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_int8                                  0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_accumulate_int8                       0x400000 (1.0.0)
                                                  cl_arm_integer_dot_product_accumulate_saturate_int8              0x400000 (1.0.0)
                                                  cl_arm_scheduling_controls                                         0x4000 (0.4.0)
                                                  cl_arm_controlled_kernel_termination                             0x400000 (1.0.0)
                                                  cl_ext_cxx_for_opencl                                            0x400000 (1.0.0)
                                                  cl_ext_image_tiling_control                                        0x1000 (0.1.0)
                                                  cl_ext_image_requirements_info                                     0x5000 (0.5.0)
                                                  cl_ext_image_from_buffer                                         0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  ARM Platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [ARM]
  clCreateContext(NULL, ...) [default]            Success [ARM]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G610 r0p0
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G610 r0p0
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 ARM Platform
    Device Name                                   Mali-G610 r0p0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.14
  ICD loader Profile                              OpenCL 3.0

darktable-cltest output:

tux@rock5b ~> darktable-cltest
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0.0001 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0,0335 [dt_get_sysresource_level] switched to 3 as `unrestricted'
     0,0335   total mem:       15967MB
     0,0335   mipmap cache:    1995MB
     0,0335   available mem:   255476MB
     0,0335   singlebuff:      15967MB
     0,0335   OpenCL tune mem: OFF
     0,0335   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
     0.0351 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
     0.0351 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'Mali-G610 r0p0'
   PLATFORM NAME & VENDOR:   ARM Platform, ARM
   CANONICAL NAME:           armplatformmalig610r0p0
   DRIVER VERSION:           3.0
   DEVICE VERSION:           OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          15960 MB
   MAX MEM ALLOC:            15960 MB
   MAX IMAGE SIZE:           65536 x 65536
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30
   CL COMPILER OPTION:       
     0.0558 [opencl_build_program] could not build program: CL_INVALID_BUILD_OPTIONS
     0.0558 [dt_opencl_device_init] failed to compile program `demosaic_ppg.cl'!
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE and NOT ENABLED.
jenshannoschwalm commented 11 months ago

Are you sure you have the CL compiler installed?

sunarowicz commented 11 months ago

No idea to be honest. I followed this instruction to make OpenCL working the RK3588 chipset: Setting up OpenCL on RK3588 using libmali. clpeak works.

Any idea how to check if OpenCL compiler is installed?

jenshannoschwalm commented 11 months ago

You will be quite alone to investigate this. You might use -d opencl -d verbose to get some more debugging output and will have to inspect the logs. I would propose to get more info about the mali cl performance before investing too much time...

sunarowicz commented 11 months ago

Thank you for the hint!

Yeah, I'm aware of the fact that this device is definitely not an OpenCL power champion. But I switch to it from my ancient 6 core Phenom II system for a home desktop computer and I'm really thrilled how nicely it runs for this purpose. Better than the old one. And for a fragment of its and even the nowadays desktop computers' power consumption. Because the ARM device cannot make use of the great darktable's SSE2 optimizations, the OpenCL seems to me to be the only hope for this device to be usable for at least an occasional RAW processing. That's my use case how I would like to use it for.

Looking at the darktable messages, I have a feeling that everything needed for OpenCL support is in place, see [opencl_init] opencl: ON. But for some reason I cannot figure out, it doesn't compile the cl kernel. Build essential and clang are installed. And I don't understand why it failed to open the /usr/share/darktable/kernels directory too. Because it does exist, contains cl files and is user readable.

Looking at the promising darktable's OpenCL messages and knowing the fact, that OpenCL support is working on the Apple's ARM silicon, I still hope it should not be too far from success here too.:-)

Would be happy for any other idea what to check or test. Thx!

tux@rock5b ~> darktable -d opencl -d verbose
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0.0001 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0,0624 [dt_get_sysresource_level] switched to 3 as `unrestricted'
     0,0624   total mem:       15967MB
     0,0624   mipmap cache:    1995MB
     0,0624   available mem:   255474MB
     0,0624   singlebuff:      15967MB
     0,0624   OpenCL tune mem: OFF
     0,0624   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
     0.0636 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
     0.0637 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
     0.0639 [dt_dlopencl_init] found default opencl runtime library 'libOpenCL.so.1'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
     0.0704 [dt_opencl_write_device_config] writing data '0 250 0 16 16 128 0 0 0.000000 0.000' for 'cldevice_v5_armplatformmalig610r0p0'
     0.0704 [dt_opencl_write_device_config] writing data '400' for 'cldevice_v5_armplatformmalig610r0p0_id0'
   DEVICE:                   0: 'Mali-G610 r0p0'
   PLATFORM NAME & VENDOR:   ARM Platform, ARM
   CANONICAL NAME:           armplatformmalig610r0p0
   DRIVER VERSION:           3.0
   DEVICE VERSION:           OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          15960 MB
   MAX MEM ALLOC:            15960 MB
   MAX IMAGE SIZE:           65536 x 65536
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30
   CL COMPILER OPTION:       
     0.0711 [dt_opencl_device_init] testing program `demosaic_ppg.cl' ..
     0.0712 [opencl_fopen_stat] could not open file `/home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30/demosaic_ppg.cl.bin'!
     0.0712 [opencl_load_program] could not load cached binary program, trying to compile source
     0.0712 [opencl_load_program] successfully loaded program from '/usr/share/darktable/kernels/demosaic_ppg.cl' MD5: 'c88f8c06c59c25137328a7d722d87fc3'
     0.0849 [opencl_build_program] could not build program: CL_INVALID_BUILD_OPTIONS
     0.0849 [opencl_build_program] BUILD STATUS: -2
     0.0849 BUILD LOG:
     0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'
error: Failed to handle include build options
error: encountered invalid build options

     0.0849 [dt_opencl_device_init] failed to compile program `demosaic_ppg.cl'!
     0.0849 [dt_opencl_write_device_config] writing data '0 250 0 16 16 128 0 0 0.000000 0.000' for 'cldevice_v5_armplatformmalig610r0p0'
     0.0849 [dt_opencl_write_device_config] writing data '400' for 'cldevice_v5_armplatformmalig610r0p0_id0'
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE and NOT ENABLED.

(darktable:5662): IBUS-WARNING **: 23:30:10.163: Unable to connect to ibus: Could not connect: Connection refused
tux@rock5b ~>
jenshannoschwalm commented 11 months ago

Look at the build logs for CL why the compilation fails.

sarunasb commented 11 months ago

Armbian on OrangePi with with RK3588S and the same Mali G610, darktable 4.2 from armbian repositories. OpenCL seems to work with the GPU (can compile and run some sample code). darktable is unable to build its kernels with the same “invalid build options” error. Where would one look for build logs? Nothing relevant in /tmp.

jenshannoschwalm commented 11 months ago

Sorry about my misleading earlier comments... i checked the code again and the logs for cl kernel compilation are reported directly on the console if you started with -d opencl -d verbose so

     0.0849 [opencl_build_program] could not build program: CL_INVALID_BUILD_OPTIONS
     0.0849 [opencl_build_program] BUILD STATUS: -2
     0.0849 BUILD LOG:
     0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'
error: Failed to handle include build options
error: encountered invalid build options

actually is your log. Your dt seems to have no /usr/share/darktable/kernels or is not accessible.

sunarowicz commented 11 months ago

The point is that there is no problem with the kernels dir. The kernel files inside it are user accessible for sure, see:

tux@rock5b ~> head /usr/share/darktable/kernels/demosaic_ppg.cl
/*
    This file is part of darktable,
    copyright (c) 2009--2010 johannes hanika.

    darktable is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    darktable is distributed in the hope that it will be useful,
tux@rock5b ~>

Couldn't it be the an issue with the kernel dir escaping format for Mali OpenCL runtime, maybe? Like here: opencl: fix include path on Mac broken with?

sunarowicz commented 10 months ago

Looking at that again, it really seems to me like a "stupid" escaping issue. Look at the dir path in the error message:

0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'

Why is the dir path enclosed by both apostrof and double quote characters? It doesn't make sense to me. Moreover the double quote character is not used anywhere else in the darktable's console outputs attached here. Could it be the culprit?

parafin commented 10 months ago

IMHO it's the same issue as I've fixed on Mac back in the day: https://github.com/darktable-org/darktable/commit/c7fd5d7d0ce3f025c1db1271996166a9e2498be5

sarunasb commented 10 months ago

Compiling darktable on aarch64 with just escapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ "); fixes the OpenCL non-compile issue.

$ /opt/darktable/bin/darktable-cltest
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0.0002 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0.0956 [dt_get_sysresource_level] switched to 1 as `default'
     0.0957   total mem:       15970MB
     0.0957   mipmap cache:    1996MB
     0.0957   available mem:   7985MB
     0.0957   singlebuff:      124MB
     0.0957   OpenCL tune mem: WANTED
     0.0957   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'Mali-G610 r0p0'
   PLATFORM NAME & VENDOR:   ARM Platform, ARM
   CANONICAL NAME:           armplatformmalig610r0p0
   DRIVER VERSION:           3.0
   DEVICE VERSION:           OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          15963 MB
   MAX MEM ALLOC:            15963 MB
   MAX IMAGE SIZE:           65536 x 65536
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            WANTED
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              3.799
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /opt/darktable/share/darktable/kernels
   KERNEL DIRECTORY:         /home/mdadmin/.cache/darktable/cached_v2_kernels_for_ARMPlatformMaliG610r0p0_30
   CL COMPILER OPTION:       
   KERNEL LOADING TIME:       0.0142 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'ARM Platform Mali-G610 r0p0'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
sunarowicz commented 10 months ago

Bingo! Thank you for this great info!

Could someone responsible please pass this info to the aarch64 build maintainers to take this into account?

Thank you in advance.

sunarowicz commented 10 months ago

Compiling darktable on aarch64 with just escapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ "); fixes the OpenCL non-compile issue.

Could you please share the way how do you compile darktable on aarch64? I follow the official instructions on the darktable website, but my compilation of the 4.4.2 version fails because of the following error:

GCC does not currently support mixed size types for ‘simd’ functions"

I have the GCC-12 installed and it is used for compilation.

BTW: I suppose you change the src/common/opencl.c file like this before starting the compilation:

  char *escapedkerneldir = NULL;
// #ifndef __APPLE__
//  escapedkerneldir = g_strdup_printf("\"%s\"", kerneldir);
// #else
  escapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ ");
// #endif

Correct?

sarunasb commented 10 months ago

I'm running Armbian (boot from emmc), upgraded to Lunar. Tried GCC 13, but that resulted in the same "simd" error. After switching to LLVM clang, darktable builds. Had to install the following, which may not be in instructions:

apt install clang llvm llvm-dev openmpi libopenmpi-dev libomp-dev

Then switch to clang temporarily (can pobably use /etc/alternatives for permanent setting):

export CC=/usr/bin/clang
export CXX=/usr/bin/clang++
export LDFLAGS="-L/usr/lib/llvm-15"
export CPPFLAGS="-I/usr/include/llvm-15"

Yes, it's the change in src/common/opencl.c, as you show.

I guess compiler directive can be added for aarch64, but I'd like to understand what's going on with double quotes, i.e. why do things work, for example in Debian/Ubuntu x86_64, but not in Armbian aarch64…

sunarowicz commented 10 months ago

Thank you so much. Again! You kicked me into the right direction. I had to install these packages which were not installed so far: clang libomp-dev libopenmpi-dev llvm-dev openmpi-bin. And had to switch to clang (although the ver. 14 that was available for my OS). Building with GCC-12 was still failing because of the same error.

I have the darkable with OpenCL working now.

Thx!

jenshannoschwalm commented 10 months ago

Would one of you be able to do a PR ?

sunarowicz commented 10 months ago

Will do anything necessary to contribute in resolving this issue. But, please excuse my ignorance, what PR stands for? Problem report? In any case, how to do a PR? Thx.

jenshannoschwalm commented 10 months ago

PR = pull request :-) maybe you @sarunasb ?

sarunasb commented 10 months ago

Would one of you be able to do a PR ?

I might be able to...

sarunasb commented 10 months ago

P.S. Setting clplatform_armplatform=TRUE or enabling ‘settings → processing → other platforms’ is now necessary for ARM Platform to be enabled.

jenshannoschwalm commented 10 months ago

Yes, i guess we want to specify that platform if your pr gets merged

jenshannoschwalm commented 10 months ago

Being curious, how "workable" is dt on the boards?

sarunasb commented 10 months ago

Moderate editing is OK, though one has to be prepared for some “working…” time. I used my own setubal.orf test file and it was workable. When trying darktable's test mire1.cr2, where it looks like every possible module is used, I never saw the end of processing. In my case OrangePi is a server, which I access via X Window.

jenshannoschwalm commented 10 months ago

Thanks for feedback, i was concidering to get one ... Is it a ram-size problem or the processing power?

sunarowicz commented 10 months ago

Being curious, how "workable" is dt on the boards?

Depends on who is asking.:-) If someone who is aware it runs on the cell phone grade HW, he will be surprised. If someone who knows how little energy it consumes for RAW processing in dt, he will be amazed. If someone who is professional or Apple silicon user, he will be disappointed.

For me it takes some 2 second waiting on opening and showing up the next (full) picture in the darkroom with some 20 steps in the compressed history. And another 2 seconds for redrawing the preview. I do work with the 16 Mpix Sony Alpha RAWs. I didn't optimize the dt settings and config for the board yet. It is already very long time ago I did these things last time, so I need to refresh my memories first.

The processing power is definitely the problem, not the RAM size. Although my board has 16 GB RAM, it hardly uses more than 3 gigs when working with dt running on wayland in KDE Plasma. Unfortunately the board's NPU seems not being used by dt at all. I wonder how much it could help. But I'm sure it won't be an easy task to add the NPU support to dt.

If you are interested in some particular tests, just let know.

sarunasb commented 10 months ago

Is it a ram-size problem or the processing power?

I don't see darktable going over 3GiB RAM. All 8 cores do get fully loaded (and scale accordingly), but only momentarily. Don't know how to monitor Mali GPU.

sunarowicz commented 10 months ago

Just found the CL_OUT_OF_RESOURCES issues in the dt log. I tried various resource levels, but it didn't helped. But I hope it doesn't hurt anything as it falls back to CPU and continues. Just slows the process down, I guess.

tux@rock5b ~> darktable -d perf -d opencl -d tiling --conf resourcelevel="large"
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0.0001 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
     0,0635 [dt_get_sysresource_level] switched to 2 as `large'
     0,0635   total mem:       15967MB
     0,0635   mipmap cache:    1995MB
     0,0635   available mem:   10915MB
     0,0635   singlebuff:      249MB
     0,0635   OpenCL tune mem: WANTED
     0,0635   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
     0.0663 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
     0.0663 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'Mali-G610 r0p0'
   PLATFORM NAME & VENDOR:   ARM Platform, ARM
   CANONICAL NAME:           armplatformmalig610r0p0
   DRIVER VERSION:           3.0
   DEVICE VERSION:           OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          15960 MB
   MAX MEM ALLOC:            15960 MB
   MAX IMAGE SIZE:           65536 x 65536
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          YES
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            WANTED
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      1024
   PERFORMANCE:              4.130
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /opt/darktable/share/darktable/kernels
   KERNEL DIRECTORY:         /home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30
   CL COMPILER OPTION:       
   KERNEL LOADING TIME:       0.0064 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'ARM Platform Mali-G610 r0p0'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200
     1.9674 [dt_dev_load_raw] loading the image. took 0.121 secs (0.289 CPU)
     2.2783 [export] creating pixelpipe took 0.205 secs (0.413 CPU)
     2.2784 [dt_opencl_check_tuning] use 15559MB (tunemem=ON, pinning=OFF) on device `ARM Platform Mali-G610 r0p0' id=0
     2.2807 [dev_pixelpipe] took 0.001 secs (0.000 CPU) initing base buffer [thumbnail]
     2.2843 [dev_pixelpipe] took 0.004 secs (0.000 CPU) [thumbnail] processed `rawprepare' on GPU, blended on GPU
     2.2872 [dev_pixelpipe] took 0.003 secs (0.000 CPU) [thumbnail] processed `temperature' on GPU, blended on GPU
     2.2898 [dev_pixelpipe] took 0.003 secs (0.004 CPU) [thumbnail] processed `highlights' on GPU, blended on GPU
     2.2913 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [thumbnail] processed `demosaic' on GPU, blended on GPU
     2.3157 [dev_pixelpipe] took 0.024 secs (0.017 CPU) [thumbnail] processed `lens' on GPU, blended on GPU
     2.3161 [dev_pixelpipe] took 0.000 secs (0.001 CPU) [thumbnail] processed `flip' on GPU, blended on GPU
     2.3165 [dev_pixelpipe] took 0.000 secs (0.001 CPU) [thumbnail] processed `clipping' on GPU, blended on GPU
     2.3169 [dev_pixelpipe] took 0.000 secs (0.002 CPU) [thumbnail] processed `exposure' on GPU, blended on GPU
     2.3293 [dev_pixelpipe] took 0.012 secs (0.020 CPU) [thumbnail] processed `toneequal' on CPU, blended on CPU
     2.3306 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [thumbnail] processed `graduatednd' on GPU, blended on GPU
     2.3317 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [thumbnail] processed `colorin' on GPU, blended on GPU
     2.3332 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.001 secs (0.004 GPU) [channelmixerrgb]
     2.3335 [dev_pixelpipe] took 0.002 secs (0.007 CPU) [thumbnail] processed `channelmixerrgb' on GPU, blended on GPU
     2.3362 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.002 secs (0.000 GPU) [atrous]
     2.3494 [dev_pixelpipe] took 0.016 secs (0.002 CPU) [thumbnail] processed `atrous' on GPU, blended on GPU
     2.3516 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [thumbnail] processed `sharpen' on GPU, blended on GPU
     2.3527 [dev_pixelpipe] took 0.001 secs (0.003 CPU) [thumbnail] processed `colorbalance' on GPU, blended on GPU
     2.3559 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.002 secs (0.000 GPU) [filmicrgb]
     2.3711 [dt_opencl_enqueue_kernel_2d] kernel 210 on device 0: CL_OUT_OF_RESOURCES
     2.3714 [opencl_filmicrgb] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
     2.3714 [pixelpipe_process_CL]       [thumbnail]    filmicrgb              (   0/   0)  299x 450 scale=0.3775 --> (   0/   0)  299x 450 scale=0.3775 couldn't run module on GPU, falling back to CPU
     2.3777 [dev_pixelpipe] took 0.025 secs (0.037 CPU) [thumbnail] processed `filmicrgb' on CPU, blended on CPU
     2.3800 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.001 secs (0.004 GPU) [bilat]
     2.3812 [dt_opencl_enqueue_kernel_2d] kernel 32 on device 0: CL_OUT_OF_RESOURCES
     2.3812 [local laplacian cl] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
     2.3856 [pixelpipe_process_CL]       [thumbnail]    bilat                  (   0/   0)  299x 450 scale=0.3775 --> (   0/   0)  299x 450 scale=0.3775 couldn't run module on GPU, falling back to CPU
     2.4110 [dev_pixelpipe] took 0.033 secs (0.106 CPU) [thumbnail] processed `bilat' on CPU, blended on CPU
     2.4122 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [thumbnail] processed `colorzones' on GPU, blended on GPU
     2.4134 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [thumbnail] processed `colorout' on GPU, blended on GPU
     2.4154 [dev_pixelpipe] took 0.002 secs (0.003 CPU) [thumbnail] processed `gamma' on CPU, blended on CPU
     2.4155 [opencl_profiling] profiling device 0 ('ARM Platform Mali-G610 r0p0'):
     2.4155 [opencl_profiling] spent  0.0020 seconds in [Write Image (from host to device)]
     2.4155 [opencl_profiling] spent  0.0004 seconds in rawprepare_1f
     2.4155 [opencl_profiling] spent  0.0004 seconds in whitebalance_1f
     2.4155 [opencl_profiling] spent  0.0004 seconds in highlights_1f_clip
     2.4155 [opencl_profiling] spent  0.0003 seconds in clip_and_zoom_demosaic_half_size
     2.4155 [opencl_profiling] spent  0.0005 seconds in [Write Buffer (from host to device)]
     2.4155 [opencl_profiling] spent  0.0003 seconds in lens_vignette
     2.4155 [opencl_profiling] spent  0.0008 seconds in lens_distort_bicubic
     2.4155 [opencl_profiling] spent  0.0002 seconds in flip
     2.4155 [opencl_profiling] spent  0.0032 seconds in [Copy Image (on device)]
     2.4155 [opencl_profiling] spent  0.0002 seconds in exposure
     2.4155 [opencl_profiling] spent  0.0014 seconds in [Read Image (from device to host)]
     2.4155 [opencl_profiling] spent  0.0002 seconds in graduatedndp
     2.4155 [opencl_profiling] spent  0.0002 seconds in colorin_unbound
     2.4155 [opencl_profiling] spent  0.0004 seconds in colorspaces_transform_lab_to_rgb_matrix
     2.4155 [opencl_profiling] spent  0.0002 seconds in channelmixerrgb_bradford_linear
     2.4155 [opencl_profiling] spent  0.0004 seconds in colorspaces_transform_rgb_matrix_to_lab
     2.4155 [opencl_profiling] spent  0.0034 seconds in eaw_decompose
     2.4155 [opencl_profiling] spent  0.0013 seconds in eaw_synthesize
     2.4155 [opencl_profiling] spent  0.0003 seconds in sharpen_hblur
     2.4155 [opencl_profiling] spent  0.0003 seconds in sharpen_vblur
     2.4155 [opencl_profiling] spent  0.0003 seconds in sharpen_mix
     2.4155 [opencl_profiling] spent  0.0002 seconds in colorbalance_cdl
     2.4155 [opencl_profiling] spent  0.0001 seconds in filmic_mask_clipped_pixels
     2.4155 [opencl_profiling] spent  0.0002 seconds in pad_input
     2.4155 [opencl_profiling] spent  0.0016 seconds in gauss_reduce
     2.4155 [opencl_profiling] spent  0.0009 seconds in process_curve
     2.4155 [opencl_profiling] spent  0.0003 seconds in colorzones_v3
     2.4155 [opencl_profiling] spent  0.0003 seconds in colorout
     2.4156 [opencl_profiling] spent  0.0205 seconds totally in command queue (with 2 events missing)
     2.4156 [dev_process_thumbnail] pixel pipeline processing took 0.137 secs (0.220 CPU)
     5.0892 [dt_dev_load_raw] loading the image. took 0.104 secs (0.169 CPU)
     5.5636 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
     5.9625 [dev_pixelpipe] took 0.005 secs (0.010 CPU) initing base buffer [full]
     5.9835 [dev_pixelpipe] took 0.021 secs (0.018 CPU) [full] processed `rawprepare' on GPU, blended on GPU
     5.9963 [dev_pixelpipe] took 0.013 secs (0.002 CPU) [full] processed `temperature' on GPU, blended on GPU
     6.0662 [pixelpipe_process_CL]       [full]         highlights             ( 158/ 498) 3658x2483 scale=1.0000 --> ( 158/ 498) 3658x2483 scale=1.0000 cl input data to host
     6.0664 [dev_pixelpipe] took 0.070 secs (0.065 CPU) [full] processed `highlights' on GPU, blended on GPU
     6.1650 [resample_cl] plan 0.002 secs (0.004 CPU) resample 0.001 secs (0.001 CPU)
     6.1650 [dev_pixelpipe] took 0.099 secs (0.071 CPU) [full] processed `demosaic' on GPU, blended on GPU
     6.4032 [dev_pixelpipe] took 0.238 secs (0.155 CPU) [full] processed `denoiseprofile' on GPU, blended on GPU
     6.5027 [dev_pixelpipe] took 0.099 secs (0.187 CPU) [full] processed `lens' on GPU, blended on GPU
     6.5033 [dev_pixelpipe] took 0.001 secs (0.001 CPU) [full] processed `clipping' on GPU, blended on GPU
     6.5526 [dev_pixelpipe] took 0.049 secs (0.059 CPU) [full] processed `spots' on CPU, blended on CPU
     6.5595 [dev_pixelpipe] took 0.007 secs (0.011 CPU) [full] processed `exposure' on GPU, blended on GPU
     6.6405 [dev_pixelpipe] took 0.081 secs (0.150 CPU) [full] processed `toneequal' on CPU, blended on CPU
     6.6542 [dev_pixelpipe] took 0.014 secs (0.020 CPU) [full] processed `graduatednd' on GPU, blended on GPU
     6.6740 [dev_pixelpipe] took 0.020 secs (0.025 CPU) [full] processed `colorin' on GPU, blended on GPU
     6.6770 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.003 secs (0.001 GPU) [channelmixerrgb]
     6.6778 [dev_pixelpipe] took 0.004 secs (0.004 CPU) [full] processed `channelmixerrgb' on GPU, blended on GPU
     6.6817 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.003 secs (0.005 GPU) [atrous]
     6.7112 [dev_pixelpipe] took 0.033 secs (0.037 CPU) [full] processed `atrous' on GPU, blended on GPU
     6.7250 [dev_pixelpipe] took 0.014 secs (0.007 CPU) [full] processed `sharpen' on GPU, blended on GPU
     6.7324 [dev_pixelpipe] took 0.007 secs (0.004 CPU) [full] processed `colorbalance' on GPU, blended on GPU
     6.7483 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.009 secs (0.004 GPU) [filmicrgb]
     6.8984 [dt_opencl_enqueue_kernel_2d] kernel 210 on device 0: CL_OUT_OF_RESOURCES
     6.8987 [opencl_filmicrgb] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
     6.8987 [pixelpipe_process_CL]       [full]         filmicrgb              (   0/   0) 1348x 899 scale=0.3660 --> (   0/   0) 1348x 899 scale=0.3660 couldn't run module on GPU, falling back to CPU
     6.9656 [dev_pixelpipe] took 0.233 secs (0.352 CPU) [full] processed `filmicrgb' on CPU, blended on CPU
     6.9835 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.005 secs (0.006 GPU) [bilat]
     7.0161 [dt_opencl_enqueue_kernel_2d] kernel 32 on device 0: CL_OUT_OF_RESOURCES
     7.0161 [local laplacian cl] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
     7.0416 [pixelpipe_process_CL]       [full]         bilat                  (   0/   0) 1348x 899 scale=0.3660 --> (   0/   0) 1348x 899 scale=0.3660 couldn't run module on GPU, falling back to CPU
     7.3190 [dev_pixelpipe] took 0.353 secs (0.879 CPU) [full] processed `bilat' on CPU, blended on CPU
     7.3434 [dev_pixelpipe] took 0.024 secs (0.032 CPU) [full] processed `colorzones' on GPU, blended on GPU
     7.4143 [pixelpipe_process_CL]       [full]         colorout               (   0/   0) 1348x 899 scale=0.3660 --> (   0/   0) 1348x 899 scale=0.3660 cl input data to host
     7.4150 [dev_pixelpipe] took 0.071 secs (0.067 CPU) [full] processed `colorout' on GPU, blended on GPU
     7.4791 [dev_pixelpipe] took 0.064 secs (0.089 CPU) [full] processed `gamma' on CPU, blended on CPU
     7.4797 [opencl_profiling] profiling device 0 ('ARM Platform Mali-G610 r0p0'):
     7.4800 [opencl_profiling] spent  0.0433 seconds in [Write Image (from host to device)]
     7.4802 [opencl_profiling] spent  0.0036 seconds in rawprepare_1f
...
jenshannoschwalm commented 10 months ago

At least the new cl error reporting interface has helped here :-)

No idea though about the "exact" problem. (And the penalty for cpu is really hard)

sunarowicz commented 10 months ago

BTW: Is it possible to entirely disable the preview pixelpipe? I don't find preview in darkroom very useful. At least for my workflow. I have already hidden the preview in the UI, but the pixelpipe is still being processed. Why? After lowering the preview quality and resolution to the lowest possible values, it still takes approx. 1 second to be processed. Completely unnecessarly in my case.

jenshannoschwalm commented 10 months ago

BTW: Is it possible to entirely disable the preview pixelpipe?

In short: NO. Generated data are used at other places like histogram...

sunarowicz commented 10 months ago

Hmm, I see. Thx for explaining.

jenshannoschwalm commented 10 months ago

Just a comment, using -d perf makes the CL code slower because it does check some internal timers. Don't know how efficient this is for arm.

From my experience -d pipe gives sufficient data about performance, adding -d gives more details in case of errors.

nteodosio commented 9 months ago

I think my problem is contained in this issue, but let me know if I should open another one.

In Ubuntu, Darktable fails to build in AArch64:

/<<PKGBUILDDIR>>/src/common/bilateral.c:468:6: warning: GCC does not currently support mixed size types for ‘simd’ functions
  468 | void dt_bilateral_slice_to_output(const dt_bilateral_t *const b,
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/src/common/bilateral.c:420:6: warning: GCC does not currently support mixed size types for ‘simd’ functions
  420 | void dt_bilateral_slice(const dt_bilateral_t *const b,
      |      ^~~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/src/common/bilateral.c: In function ‘_ZGVnN1vva64_dt_bilateral_splat’:
/<<PKGBUILDDIR>>/src/common/bilateral.c:297:1: error: unrecognizable insn:
  297 | }
      | ^
(insn 563 562 564 3 (set (mem/c:TF (plus:DI (reg/f:DI 31 sp)
                (const_int 512 [0x200])) [65  S16 A8])
        (reg:TF 54 v22)) -1
     (expr_list:REG_DEAD (reg:TF 54 v22)
        (nil)))
during RTL pass: sched_fusion
/<<PKGBUILDDIR>>/src/common/bilateral.c:297:1: internal compiler error: in get_attr_type, at config/aarch64/aarch64.md:24655
0xb4849b internal_error(char const*, ...)
        ???:0
0xb48587 fancy_abort(char const*, int, char const*)
        ???:0
0x775c9f _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
        ???:0
0x775cd3 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        ???:0
0x114e1b7 get_attr_type(rtx_insn*)
        ???:0
0x10fb563 schedule_block(basic_block_def**, void*)
        ???:0
0x10cbb43 schedule_insns()
        ???:0

Full log.

sunarowicz commented 9 months ago

Compiling with GCC fails on aarch64, You have to compile using LLVM clang. See couple of messages starting from here: https://github.com/darktable-org/darktable/issues/15067#issuecomment-1697066708.

github-actions[bot] commented 7 months ago

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

jenshannoschwalm commented 7 months ago

Building issue fixed on master via #15207