Closed sunarowicz closed 7 months ago
Are you sure you have the CL compiler installed?
No idea to be honest. I followed this instruction to make OpenCL working the RK3588 chipset: Setting up OpenCL on RK3588 using libmali. clpeak
works.
Any idea how to check if OpenCL compiler is installed?
You will be quite alone to investigate this. You might use -d opencl -d verbose
to get some more debugging output and will have to inspect the logs. I would propose to get more info about the mali cl performance before investing too much time...
Thank you for the hint!
Yeah, I'm aware of the fact that this device is definitely not an OpenCL power champion. But I switch to it from my ancient 6 core Phenom II system for a home desktop computer and I'm really thrilled how nicely it runs for this purpose. Better than the old one. And for a fragment of its and even the nowadays desktop computers' power consumption. Because the ARM device cannot make use of the great darktable's SSE2 optimizations, the OpenCL seems to me to be the only hope for this device to be usable for at least an occasional RAW processing. That's my use case how I would like to use it for.
Looking at the darktable messages, I have a feeling that everything needed for OpenCL support is in place, see [opencl_init] opencl: ON
. But for some reason I cannot figure out, it doesn't compile the cl kernel. Build essential and clang are installed. And I don't understand why it failed to open the /usr/share/darktable/kernels
directory too. Because it does exist, contains cl files and is user readable.
Looking at the promising darktable's OpenCL messages and knowing the fact, that OpenCL support is working on the Apple's ARM silicon, I still hope it should not be too far from success here too.:-)
Would be happy for any other idea what to check or test. Thx!
tux@rock5b ~> darktable -d opencl -d verbose
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0.0001 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0,0624 [dt_get_sysresource_level] switched to 3 as `unrestricted'
0,0624 total mem: 15967MB
0,0624 mipmap cache: 1995MB
0,0624 available mem: 255474MB
0,0624 singlebuff: 15967MB
0,0624 OpenCL tune mem: OFF
0,0624 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
0.0636 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
0.0637 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
0.0639 [dt_dlopencl_init] found default opencl runtime library 'libOpenCL.so.1'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
0.0704 [dt_opencl_write_device_config] writing data '0 250 0 16 16 128 0 0 0.000000 0.000' for 'cldevice_v5_armplatformmalig610r0p0'
0.0704 [dt_opencl_write_device_config] writing data '400' for 'cldevice_v5_armplatformmalig610r0p0_id0'
DEVICE: 0: 'Mali-G610 r0p0'
PLATFORM NAME & VENDOR: ARM Platform, ARM
CANONICAL NAME: armplatformmalig610r0p0
DRIVER VERSION: 3.0
DEVICE VERSION: OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 15960 MB
MAX MEM ALLOC: 15960 MB
MAX IMAGE SIZE: 65536 x 65536
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30
CL COMPILER OPTION:
0.0711 [dt_opencl_device_init] testing program `demosaic_ppg.cl' ..
0.0712 [opencl_fopen_stat] could not open file `/home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30/demosaic_ppg.cl.bin'!
0.0712 [opencl_load_program] could not load cached binary program, trying to compile source
0.0712 [opencl_load_program] successfully loaded program from '/usr/share/darktable/kernels/demosaic_ppg.cl' MD5: 'c88f8c06c59c25137328a7d722d87fc3'
0.0849 [opencl_build_program] could not build program: CL_INVALID_BUILD_OPTIONS
0.0849 [opencl_build_program] BUILD STATUS: -2
0.0849 BUILD LOG:
0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'
error: Failed to handle include build options
error: encountered invalid build options
0.0849 [dt_opencl_device_init] failed to compile program `demosaic_ppg.cl'!
0.0849 [dt_opencl_write_device_config] writing data '0 250 0 16 16 128 0 0 0.000000 0.000' for 'cldevice_v5_armplatformmalig610r0p0'
0.0849 [dt_opencl_write_device_config] writing data '400' for 'cldevice_v5_armplatformmalig610r0p0_id0'
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE and NOT ENABLED.
(darktable:5662): IBUS-WARNING **: 23:30:10.163: Unable to connect to ibus: Could not connect: Connection refused
tux@rock5b ~>
Look at the build logs for CL why the compilation fails.
Armbian on OrangePi with with RK3588S and the same Mali G610, darktable 4.2 from armbian repositories. OpenCL seems to work with the GPU (can compile and run some sample code). darktable is unable to build its kernels with the same “invalid build options” error. Where would one look for build logs? Nothing relevant in /tmp
.
Sorry about my misleading earlier comments... i checked the code again and the logs for cl kernel compilation are reported directly on the console if you started with -d opencl -d verbose
so
0.0849 [opencl_build_program] could not build program: CL_INVALID_BUILD_OPTIONS
0.0849 [opencl_build_program] BUILD STATUS: -2
0.0849 BUILD LOG:
0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'
error: Failed to handle include build options
error: encountered invalid build options
actually is your log. Your dt seems to have no /usr/share/darktable/kernels
or is not accessible.
The point is that there is no problem with the kernels dir. The kernel files inside it are user accessible for sure, see:
tux@rock5b ~> head /usr/share/darktable/kernels/demosaic_ppg.cl
/*
This file is part of darktable,
copyright (c) 2009--2010 johannes hanika.
darktable is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
darktable is distributed in the hope that it will be useful,
tux@rock5b ~>
Couldn't it be the an issue with the kernel dir escaping format for Mali OpenCL runtime, maybe? Like here: opencl: fix include path on Mac broken with?
Looking at that again, it really seems to me like a "stupid" escaping issue. Look at the dir path in the error message:
0.0849 error: Failed to open directory '"/usr/share/darktable/kernels"'
Why is the dir path enclosed by both apostrof and double quote characters? It doesn't make sense to me. Moreover the double quote character is not used anywhere else in the darktable's console outputs attached here. Could it be the culprit?
IMHO it's the same issue as I've fixed on Mac back in the day: https://github.com/darktable-org/darktable/commit/c7fd5d7d0ce3f025c1db1271996166a9e2498be5
Compiling darktable on aarch64
with just
escapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ ");
fixes the OpenCL non-compile issue.
$ /opt/darktable/bin/darktable-cltest
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0.0002 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0.0956 [dt_get_sysresource_level] switched to 1 as `default'
0.0957 total mem: 15970MB
0.0957 mipmap cache: 1996MB
0.0957 available mem: 7985MB
0.0957 singlebuff: 124MB
0.0957 OpenCL tune mem: WANTED
0.0957 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'Mali-G610 r0p0'
PLATFORM NAME & VENDOR: ARM Platform, ARM
CANONICAL NAME: armplatformmalig610r0p0
DRIVER VERSION: 3.0
DEVICE VERSION: OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 15963 MB
MAX MEM ALLOC: 15963 MB
MAX IMAGE SIZE: 65536 x 65536
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: WANTED
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 3.799
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /opt/darktable/share/darktable/kernels
KERNEL DIRECTORY: /home/mdadmin/.cache/darktable/cached_v2_kernels_for_ARMPlatformMaliG610r0p0_30
CL COMPILER OPTION:
KERNEL LOADING TIME: 0.0142 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 'ARM Platform Mali-G610 r0p0'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
Bingo! Thank you for this great info!
Could someone responsible please pass this info to the aarch64 build maintainers to take this into account?
Thank you in advance.
Compiling darktable on
aarch64
with justescapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ ");
fixes the OpenCL non-compile issue.
Could you please share the way how do you compile darktable on aarch64? I follow the official instructions on the darktable website, but my compilation of the 4.4.2 version fails because of the following error:
GCC does not currently support mixed size types for ‘simd’ functions"
I have the GCC-12 installed and it is used for compilation.
BTW: I suppose you change the src/common/opencl.c
file like this before starting the compilation:
char *escapedkerneldir = NULL;
// #ifndef __APPLE__
// escapedkerneldir = g_strdup_printf("\"%s\"", kerneldir);
// #else
escapedkerneldir = dt_util_str_replace(kerneldir, " ", "\\ ");
// #endif
Correct?
I'm running Armbian (boot from emmc), upgraded to Lunar. Tried GCC 13, but that resulted in the same "simd" error. After switching to LLVM clang, darktable builds. Had to install the following, which may not be in instructions:
apt install clang llvm llvm-dev openmpi libopenmpi-dev libomp-dev
Then switch to clang temporarily (can pobably use /etc/alternatives for permanent setting):
export CC=/usr/bin/clang
export CXX=/usr/bin/clang++
export LDFLAGS="-L/usr/lib/llvm-15"
export CPPFLAGS="-I/usr/include/llvm-15"
Yes, it's the change in src/common/opencl.c
, as you show.
I guess compiler directive can be added for aarch64
, but I'd like to understand what's going on with double quotes, i.e. why do things work, for example in Debian/Ubuntu x86_64, but not in Armbian aarch64…
Thank you so much. Again! You kicked me into the right direction. I had to install these packages which were not installed so far: clang libomp-dev libopenmpi-dev llvm-dev openmpi-bin
. And had to switch to clang (although the ver. 14 that was available for my OS). Building with GCC-12 was still failing because of the same error.
I have the darkable with OpenCL working now.
Thx!
Would one of you be able to do a PR ?
Will do anything necessary to contribute in resolving this issue. But, please excuse my ignorance, what PR stands for? Problem report? In any case, how to do a PR? Thx.
PR = pull request :-) maybe you @sarunasb ?
Would one of you be able to do a PR ?
I might be able to...
P.S. Setting clplatform_armplatform=TRUE
or enabling ‘settings → processing → other platforms’ is now necessary for ARM Platform to be enabled.
Yes, i guess we want to specify that platform if your pr gets merged
Being curious, how "workable" is dt on the boards?
Moderate editing is OK, though one has to be prepared for some “working…” time. I used my own setubal.orf test file and it was workable. When trying darktable's test mire1.cr2, where it looks like every possible module is used, I never saw the end of processing. In my case OrangePi is a server, which I access via X Window.
Thanks for feedback, i was concidering to get one ... Is it a ram-size problem or the processing power?
Being curious, how "workable" is dt on the boards?
Depends on who is asking.:-) If someone who is aware it runs on the cell phone grade HW, he will be surprised. If someone who knows how little energy it consumes for RAW processing in dt, he will be amazed. If someone who is professional or Apple silicon user, he will be disappointed.
For me it takes some 2 second waiting on opening and showing up the next (full) picture in the darkroom with some 20 steps in the compressed history. And another 2 seconds for redrawing the preview. I do work with the 16 Mpix Sony Alpha RAWs. I didn't optimize the dt settings and config for the board yet. It is already very long time ago I did these things last time, so I need to refresh my memories first.
The processing power is definitely the problem, not the RAM size. Although my board has 16 GB RAM, it hardly uses more than 3 gigs when working with dt running on wayland in KDE Plasma. Unfortunately the board's NPU seems not being used by dt at all. I wonder how much it could help. But I'm sure it won't be an easy task to add the NPU support to dt.
If you are interested in some particular tests, just let know.
Is it a ram-size problem or the processing power?
I don't see darktable going over 3GiB RAM. All 8 cores do get fully loaded (and scale accordingly), but only momentarily. Don't know how to monitor Mali GPU.
Just found the CL_OUT_OF_RESOURCES issues in the dt log. I tried various resource levels, but it didn't helped. But I hope it doesn't hurt anything as it falls back to CPU and continues. Just slows the process down, I guess.
tux@rock5b ~> darktable -d perf -d opencl -d tiling --conf resourcelevel="large"
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0.0001 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
[dt_detect_cpu_features] Not implemented for this architecture.
[dt_detect_cpu_features] Please contribute a patch.
0,0635 [dt_get_sysresource_level] switched to 2 as `large'
0,0635 total mem: 15967MB
0,0635 mipmap cache: 1995MB
0,0635 available mem: 10915MB
0,0635 singlebuff: 249MB
0,0635 OpenCL tune mem: WANTED
0,0635 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
0.0663 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
0.0663 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'Mali-G610 r0p0'
PLATFORM NAME & VENDOR: ARM Platform, ARM
CANONICAL NAME: armplatformmalig610r0p0
DRIVER VERSION: 3.0
DEVICE VERSION: OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 15960 MB
MAX MEM ALLOC: 15960 MB
MAX IMAGE SIZE: 65536 x 65536
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: YES
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: WANTED
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 1024
PERFORMANCE: 4.130
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /opt/darktable/share/darktable/kernels
KERNEL DIRECTORY: /home/tux/.cache/darktable/cached_v1_kernels_for_ARMPlatformMaliG610r0p0_30
CL COMPILER OPTION:
KERNEL LOADING TIME: 0.0064 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 'ARM Platform Mali-G610 r0p0'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
1.9674 [dt_dev_load_raw] loading the image. took 0.121 secs (0.289 CPU)
2.2783 [export] creating pixelpipe took 0.205 secs (0.413 CPU)
2.2784 [dt_opencl_check_tuning] use 15559MB (tunemem=ON, pinning=OFF) on device `ARM Platform Mali-G610 r0p0' id=0
2.2807 [dev_pixelpipe] took 0.001 secs (0.000 CPU) initing base buffer [thumbnail]
2.2843 [dev_pixelpipe] took 0.004 secs (0.000 CPU) [thumbnail] processed `rawprepare' on GPU, blended on GPU
2.2872 [dev_pixelpipe] took 0.003 secs (0.000 CPU) [thumbnail] processed `temperature' on GPU, blended on GPU
2.2898 [dev_pixelpipe] took 0.003 secs (0.004 CPU) [thumbnail] processed `highlights' on GPU, blended on GPU
2.2913 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [thumbnail] processed `demosaic' on GPU, blended on GPU
2.3157 [dev_pixelpipe] took 0.024 secs (0.017 CPU) [thumbnail] processed `lens' on GPU, blended on GPU
2.3161 [dev_pixelpipe] took 0.000 secs (0.001 CPU) [thumbnail] processed `flip' on GPU, blended on GPU
2.3165 [dev_pixelpipe] took 0.000 secs (0.001 CPU) [thumbnail] processed `clipping' on GPU, blended on GPU
2.3169 [dev_pixelpipe] took 0.000 secs (0.002 CPU) [thumbnail] processed `exposure' on GPU, blended on GPU
2.3293 [dev_pixelpipe] took 0.012 secs (0.020 CPU) [thumbnail] processed `toneequal' on CPU, blended on CPU
2.3306 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [thumbnail] processed `graduatednd' on GPU, blended on GPU
2.3317 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [thumbnail] processed `colorin' on GPU, blended on GPU
2.3332 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.001 secs (0.004 GPU) [channelmixerrgb]
2.3335 [dev_pixelpipe] took 0.002 secs (0.007 CPU) [thumbnail] processed `channelmixerrgb' on GPU, blended on GPU
2.3362 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.002 secs (0.000 GPU) [atrous]
2.3494 [dev_pixelpipe] took 0.016 secs (0.002 CPU) [thumbnail] processed `atrous' on GPU, blended on GPU
2.3516 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [thumbnail] processed `sharpen' on GPU, blended on GPU
2.3527 [dev_pixelpipe] took 0.001 secs (0.003 CPU) [thumbnail] processed `colorbalance' on GPU, blended on GPU
2.3559 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.002 secs (0.000 GPU) [filmicrgb]
2.3711 [dt_opencl_enqueue_kernel_2d] kernel 210 on device 0: CL_OUT_OF_RESOURCES
2.3714 [opencl_filmicrgb] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
2.3714 [pixelpipe_process_CL] [thumbnail] filmicrgb ( 0/ 0) 299x 450 scale=0.3775 --> ( 0/ 0) 299x 450 scale=0.3775 couldn't run module on GPU, falling back to CPU
2.3777 [dev_pixelpipe] took 0.025 secs (0.037 CPU) [thumbnail] processed `filmicrgb' on CPU, blended on CPU
2.3800 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.001 secs (0.004 GPU) [bilat]
2.3812 [dt_opencl_enqueue_kernel_2d] kernel 32 on device 0: CL_OUT_OF_RESOURCES
2.3812 [local laplacian cl] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
2.3856 [pixelpipe_process_CL] [thumbnail] bilat ( 0/ 0) 299x 450 scale=0.3775 --> ( 0/ 0) 299x 450 scale=0.3775 couldn't run module on GPU, falling back to CPU
2.4110 [dev_pixelpipe] took 0.033 secs (0.106 CPU) [thumbnail] processed `bilat' on CPU, blended on CPU
2.4122 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [thumbnail] processed `colorzones' on GPU, blended on GPU
2.4134 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [thumbnail] processed `colorout' on GPU, blended on GPU
2.4154 [dev_pixelpipe] took 0.002 secs (0.003 CPU) [thumbnail] processed `gamma' on CPU, blended on CPU
2.4155 [opencl_profiling] profiling device 0 ('ARM Platform Mali-G610 r0p0'):
2.4155 [opencl_profiling] spent 0.0020 seconds in [Write Image (from host to device)]
2.4155 [opencl_profiling] spent 0.0004 seconds in rawprepare_1f
2.4155 [opencl_profiling] spent 0.0004 seconds in whitebalance_1f
2.4155 [opencl_profiling] spent 0.0004 seconds in highlights_1f_clip
2.4155 [opencl_profiling] spent 0.0003 seconds in clip_and_zoom_demosaic_half_size
2.4155 [opencl_profiling] spent 0.0005 seconds in [Write Buffer (from host to device)]
2.4155 [opencl_profiling] spent 0.0003 seconds in lens_vignette
2.4155 [opencl_profiling] spent 0.0008 seconds in lens_distort_bicubic
2.4155 [opencl_profiling] spent 0.0002 seconds in flip
2.4155 [opencl_profiling] spent 0.0032 seconds in [Copy Image (on device)]
2.4155 [opencl_profiling] spent 0.0002 seconds in exposure
2.4155 [opencl_profiling] spent 0.0014 seconds in [Read Image (from device to host)]
2.4155 [opencl_profiling] spent 0.0002 seconds in graduatedndp
2.4155 [opencl_profiling] spent 0.0002 seconds in colorin_unbound
2.4155 [opencl_profiling] spent 0.0004 seconds in colorspaces_transform_lab_to_rgb_matrix
2.4155 [opencl_profiling] spent 0.0002 seconds in channelmixerrgb_bradford_linear
2.4155 [opencl_profiling] spent 0.0004 seconds in colorspaces_transform_rgb_matrix_to_lab
2.4155 [opencl_profiling] spent 0.0034 seconds in eaw_decompose
2.4155 [opencl_profiling] spent 0.0013 seconds in eaw_synthesize
2.4155 [opencl_profiling] spent 0.0003 seconds in sharpen_hblur
2.4155 [opencl_profiling] spent 0.0003 seconds in sharpen_vblur
2.4155 [opencl_profiling] spent 0.0003 seconds in sharpen_mix
2.4155 [opencl_profiling] spent 0.0002 seconds in colorbalance_cdl
2.4155 [opencl_profiling] spent 0.0001 seconds in filmic_mask_clipped_pixels
2.4155 [opencl_profiling] spent 0.0002 seconds in pad_input
2.4155 [opencl_profiling] spent 0.0016 seconds in gauss_reduce
2.4155 [opencl_profiling] spent 0.0009 seconds in process_curve
2.4155 [opencl_profiling] spent 0.0003 seconds in colorzones_v3
2.4155 [opencl_profiling] spent 0.0003 seconds in colorout
2.4156 [opencl_profiling] spent 0.0205 seconds totally in command queue (with 2 events missing)
2.4156 [dev_process_thumbnail] pixel pipeline processing took 0.137 secs (0.220 CPU)
5.0892 [dt_dev_load_raw] loading the image. took 0.104 secs (0.169 CPU)
5.5636 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
5.9625 [dev_pixelpipe] took 0.005 secs (0.010 CPU) initing base buffer [full]
5.9835 [dev_pixelpipe] took 0.021 secs (0.018 CPU) [full] processed `rawprepare' on GPU, blended on GPU
5.9963 [dev_pixelpipe] took 0.013 secs (0.002 CPU) [full] processed `temperature' on GPU, blended on GPU
6.0662 [pixelpipe_process_CL] [full] highlights ( 158/ 498) 3658x2483 scale=1.0000 --> ( 158/ 498) 3658x2483 scale=1.0000 cl input data to host
6.0664 [dev_pixelpipe] took 0.070 secs (0.065 CPU) [full] processed `highlights' on GPU, blended on GPU
6.1650 [resample_cl] plan 0.002 secs (0.004 CPU) resample 0.001 secs (0.001 CPU)
6.1650 [dev_pixelpipe] took 0.099 secs (0.071 CPU) [full] processed `demosaic' on GPU, blended on GPU
6.4032 [dev_pixelpipe] took 0.238 secs (0.155 CPU) [full] processed `denoiseprofile' on GPU, blended on GPU
6.5027 [dev_pixelpipe] took 0.099 secs (0.187 CPU) [full] processed `lens' on GPU, blended on GPU
6.5033 [dev_pixelpipe] took 0.001 secs (0.001 CPU) [full] processed `clipping' on GPU, blended on GPU
6.5526 [dev_pixelpipe] took 0.049 secs (0.059 CPU) [full] processed `spots' on CPU, blended on CPU
6.5595 [dev_pixelpipe] took 0.007 secs (0.011 CPU) [full] processed `exposure' on GPU, blended on GPU
6.6405 [dev_pixelpipe] took 0.081 secs (0.150 CPU) [full] processed `toneequal' on CPU, blended on CPU
6.6542 [dev_pixelpipe] took 0.014 secs (0.020 CPU) [full] processed `graduatednd' on GPU, blended on GPU
6.6740 [dev_pixelpipe] took 0.020 secs (0.025 CPU) [full] processed `colorin' on GPU, blended on GPU
6.6770 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.003 secs (0.001 GPU) [channelmixerrgb]
6.6778 [dev_pixelpipe] took 0.004 secs (0.004 CPU) [full] processed `channelmixerrgb' on GPU, blended on GPU
6.6817 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.003 secs (0.005 GPU) [atrous]
6.7112 [dev_pixelpipe] took 0.033 secs (0.037 CPU) [full] processed `atrous' on GPU, blended on GPU
6.7250 [dev_pixelpipe] took 0.014 secs (0.007 CPU) [full] processed `sharpen' on GPU, blended on GPU
6.7324 [dev_pixelpipe] took 0.007 secs (0.004 CPU) [full] processed `colorbalance' on GPU, blended on GPU
6.7483 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.009 secs (0.004 GPU) [filmicrgb]
6.8984 [dt_opencl_enqueue_kernel_2d] kernel 210 on device 0: CL_OUT_OF_RESOURCES
6.8987 [opencl_filmicrgb] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
6.8987 [pixelpipe_process_CL] [full] filmicrgb ( 0/ 0) 1348x 899 scale=0.3660 --> ( 0/ 0) 1348x 899 scale=0.3660 couldn't run module on GPU, falling back to CPU
6.9656 [dev_pixelpipe] took 0.233 secs (0.352 CPU) [full] processed `filmicrgb' on CPU, blended on CPU
6.9835 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.005 secs (0.006 GPU) [bilat]
7.0161 [dt_opencl_enqueue_kernel_2d] kernel 32 on device 0: CL_OUT_OF_RESOURCES
7.0161 [local laplacian cl] couldn't enqueue kernel! CL_OUT_OF_RESOURCES
7.0416 [pixelpipe_process_CL] [full] bilat ( 0/ 0) 1348x 899 scale=0.3660 --> ( 0/ 0) 1348x 899 scale=0.3660 couldn't run module on GPU, falling back to CPU
7.3190 [dev_pixelpipe] took 0.353 secs (0.879 CPU) [full] processed `bilat' on CPU, blended on CPU
7.3434 [dev_pixelpipe] took 0.024 secs (0.032 CPU) [full] processed `colorzones' on GPU, blended on GPU
7.4143 [pixelpipe_process_CL] [full] colorout ( 0/ 0) 1348x 899 scale=0.3660 --> ( 0/ 0) 1348x 899 scale=0.3660 cl input data to host
7.4150 [dev_pixelpipe] took 0.071 secs (0.067 CPU) [full] processed `colorout' on GPU, blended on GPU
7.4791 [dev_pixelpipe] took 0.064 secs (0.089 CPU) [full] processed `gamma' on CPU, blended on CPU
7.4797 [opencl_profiling] profiling device 0 ('ARM Platform Mali-G610 r0p0'):
7.4800 [opencl_profiling] spent 0.0433 seconds in [Write Image (from host to device)]
7.4802 [opencl_profiling] spent 0.0036 seconds in rawprepare_1f
...
At least the new cl error reporting interface has helped here :-)
No idea though about the "exact" problem. (And the penalty for cpu is really hard)
BTW: Is it possible to entirely disable the preview pixelpipe? I don't find preview in darkroom very useful. At least for my workflow. I have already hidden the preview in the UI, but the pixelpipe is still being processed. Why? After lowering the preview quality and resolution to the lowest possible values, it still takes approx. 1 second to be processed. Completely unnecessarly in my case.
BTW: Is it possible to entirely disable the preview pixelpipe?
In short: NO. Generated data are used at other places like histogram...
Hmm, I see. Thx for explaining.
Just a comment, using -d perf makes the CL code slower because it does check some internal timers. Don't know how efficient this is for arm.
From my experience -d pipe gives sufficient data about performance, adding -d gives more details in case of errors.
I think my problem is contained in this issue, but let me know if I should open another one.
In Ubuntu, Darktable fails to build in AArch64:
/<<PKGBUILDDIR>>/src/common/bilateral.c:468:6: warning: GCC does not currently support mixed size types for ‘simd’ functions
468 | void dt_bilateral_slice_to_output(const dt_bilateral_t *const b,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/src/common/bilateral.c:420:6: warning: GCC does not currently support mixed size types for ‘simd’ functions
420 | void dt_bilateral_slice(const dt_bilateral_t *const b,
| ^~~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/src/common/bilateral.c: In function ‘_ZGVnN1vva64_dt_bilateral_splat’:
/<<PKGBUILDDIR>>/src/common/bilateral.c:297:1: error: unrecognizable insn:
297 | }
| ^
(insn 563 562 564 3 (set (mem/c:TF (plus:DI (reg/f:DI 31 sp)
(const_int 512 [0x200])) [65 S16 A8])
(reg:TF 54 v22)) -1
(expr_list:REG_DEAD (reg:TF 54 v22)
(nil)))
during RTL pass: sched_fusion
/<<PKGBUILDDIR>>/src/common/bilateral.c:297:1: internal compiler error: in get_attr_type, at config/aarch64/aarch64.md:24655
0xb4849b internal_error(char const*, ...)
???:0
0xb48587 fancy_abort(char const*, int, char const*)
???:0
0x775c9f _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
???:0
0x775cd3 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
???:0
0x114e1b7 get_attr_type(rtx_insn*)
???:0
0x10fb563 schedule_block(basic_block_def**, void*)
???:0
0x10cbb43 schedule_insns()
???:0
Compiling with GCC fails on aarch64, You have to compile using LLVM clang. See couple of messages starting from here: https://github.com/darktable-org/darktable/issues/15067#issuecomment-1697066708.
This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.
Building issue fixed on master via #15207
Darktable already runs on ARM (not Apple Silicon only). That's really great, many thanks to all who contributed to this. But when I try to run it on the Radxa Rock Pi 5B SBC with RK3588 chipset, it runs without the OpenCL support only, although the OpenCL is available in the system, see the
clinfo
output below. As darktable runs with OpenCL support on Apple Silicon, I hope it should not be the mission impossible to make it possible for other ARM devices too. Especially if it fails "only" because of invalid CL build options, see thedarktable-cltest
output below.Please make OpenCL available for non Apple Silicon ARM devices too. I'll be happy to provide any necessary testing if needed.
Thank you for considering this.
clinfo
output:darktable-cltest output: