ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Non uniform workgroup size is not supported!! #999

Closed younglight7 closed 1 year ago

younglight7 commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version':

arm_compute_version=v22.02 Build options: {'arch': 'arm64-v8a', 'neon': '1', 'opencl': '1', 'embed_kernels': '1', 'extra_cxx_flags': '-fPIC'} Git hash=b'8f587de9214dbc3aee4ff4eeb2ede66747769b19'

Platform:

KHADAS VIM3

Operating System:

Linux Khadas 4.9.241 #18 SMP PREEMPT Mon Jul 25 18:53:47 CST 2022 aarch64 aarch64 aarch64 GNU/Linux

Problem description:

I run ArmNN on VIM3 with CPU successful, but an error occurred when I use Gpu.

I used the following command:

./TfLiteMobilenetV2Quantized-Armnn -d /root/images/ -m /root/models/ -c GpuAcc

errors :

Info: Optimization time: 6.90 ms.
Error: An error occurred when preparing the network workloads:  in generate_build_options src/core/CL/CLCompileContext.cpp:270: Non uniform workgroup size is not supported!!
Info: Network loading time: 1.84 ms.
Info: Shutdown time: 42.54 ms.
Fatal: Armnn Error: IRuntime::LoadNetwork failed

It looks like that this device do not support "cl_arm_non_uniform_work_group_size". But "clinfo" looks like no problem.

At the beginning , I used 22.08 , then I found another issue ( https://github.com/ARM-software/ComputeLibrary/issues/953 ) and used 22.04 as suggested , but it does't work.

clinfo :

`Number of platforms 1 Platform Name ARM Platform Platform Vendor ARM Platform Version OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory Platform Extensions function suffix ARM

Platform Name ARM Platform Number of devices 1 Device Name Mali-G52 Device Vendor ARM Device Vendor ID 0x72120000 Device Version OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58 Driver Version 2.0 Device OpenCL C Version OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 2 Max clock frequency 750MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 384x384x384 Max work group size 384 Preferred work group size multiple 8 Preferred / native vector sizes char 16 / 4 short 8 / 2 int 4 / 1 long 2 / 1 half 8 / 2 (cl_khr_fp16) float 4 / 1 double 0 / 0 (n/a) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 64, Little-Endian Global memory size 3887882240 (3.621GiB) Error Correction support No Max memory allocation 971970560 (926.9MiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Shared Virtual Memory (SVM) capabilities (ARM) Coarse-grained buffer sharing Yes Fine-grained buffer sharing No Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 0 bytes Global 0 bytes Local 0 bytes Max size for global variable 65536 (64KiB) Preferred total size of global vars 0 Global Memory cache type Read/Write Global Memory cache size 131072 (128KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 32 bytes Pitch alignment for 2D image buffers 64 pixels Max 2D image size 65536x65536 pixels Max 3D image size 65536x65536x65536 pixels Max number of read image args 128 Max number of write image args 64 Max number of read/write image args 64 Max number of pipe args 16 Max active pipe reservations 1 Max pipe packet size 1024 Local memory type Global Local memory size 32768 (32KiB) Max number of constant args 8 Max constant buffer size 65536 (64KiB) Max size of kernel argument 1024 Queue properties (on host) Out-of-order execution Yes Profiling Yes Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 2097152 (2MiB) Max size 16777216 (16MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop No Profiling timer resolution 1000ns Execution capabilities Run OpenCL kernels Yes Run native kernels No printf() buffer size 1048576 (1024KiB) Built-in kernels (n/a) Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM] clCreateContext(NULL, ...) [default] Success [ARM] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name ARM Platform Device Name Mali-G52 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name ARM Platform Device Name Mali-G52 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name ARM Platform Device Name Mali-G52

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.11 ICD loader Profile OpenCL 2.1 `

How can I fix this problem?

Thanks.

morgolock commented 1 year ago

Hi @younglight7

clinfo shows that your device supports cl_arm_non_uniform_work_group_size, the problem you experience is likely caused by a build error.

Could you please tell us more about how you built armnn and acl? Could you please share all the output including any warnings when you run the command TfLiteMobilenetV2Quantized-Armnn ?

younglight7 commented 1 year ago

Hi @morgolock Thank you for the reply.

I built armnn and acl according to the guide " BuildGuideCrossCompilation.md ". I would recompile all again if it was caused by a build error.

These are all outputs :

Info: ArmNN v30.0.0
Info: Initialization time: 113.81 ms.
Info: Network parsing time: 36.39 ms.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Warning: The backend makes use of a deprecated interface to read constant tensors. If you are a backend developer please find more information in our doxygen documentation on github https://github.com/ARM-software/armnn under the keyword 'ConstTensorsAsInputs'.
Info: Optimization time: 6.85 ms.
Error: An error occurred when preparing the network workloads:  in generate_build_options src/core/CL/CLCompileContext.cpp:270: Non uniform workgroup size is not supported!!
Info: Network loading time: 1.80 ms.
Info: Shutdown time: 43.66 ms.
Fatal: Armnn Error: IRuntime::LoadNetwork failed
morgolock commented 1 year ago

Hi @younglight7

Can you please try again but this time replace your armnn build by the official prebuilt release from https://github.com/ARM-software/armnn/releases

This will let us know if there is a problem with your build.

younglight7 commented 1 year ago

Hi @morgolock

Can you please try again but this time replace your armnn build by the official prebuilt release from https://github.com/ARM-software/armnn/releases

This will let us know if there is a problem with your build.

Hi @morgolock

I have used the official prebuilt release but it has the same problem. And I built a docker according to the build tool docs , then used those libs, but failed too.

I used the test programs built before and replaced the armnn binaries like this :

export LD_LIBRARY_PATH=/root/bryce/armnnlibs
./TfLiteMobilenetV2Quantized-Armnn -d /root/bryce/images/ -m /root/bryce/models/ -c GpuAcc
morgolock commented 1 year ago

Hi @younglight7

Could you please try running the test this way: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/bryce/armnnlibs ./TfLiteMobilenetV2Quantized-Armnn -d /root/bryce/images/ -m /root/bryce/models/ -c GpuAcc

I think the problem is that export LD_LIBRARY_PATH=/root/bryce/armnnlibs is breaking your enviroment because you don't mention the existing path $LD_LIBRARY_PATH

Hope this helps

younglight7 commented 1 year ago

Hi @younglight7

Could you please try running the test this way: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/bryce/armnnlibs ./TfLiteMobilenetV2Quantized-Armnn -d /root/bryce/images/ -m /root/bryce/models/ -c GpuAcc

I think the problem is that export LD_LIBRARY_PATH=/root/bryce/armnnlibs is breaking your enviroment because you don't mention the existing path $LD_LIBRARY_PATH

Hope this helps

Hi @morgolock

Thanks for your suggestion. I thought about this before and tried re-flashing my board. Then I found that this parameter is empty by default, and it still has the same problem.

morgolock commented 1 year ago

Hi @younglight7

What happens when running LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/bryce/armnnlibs clinfo ?

Have you tried running the test on a different device? Have you tried running other tests on the same device?

Hope this helps.

Mr-Franco commented 1 year ago

Hi @younglight7 and @morgolock

I'm having similar issues even after matching acl and armnn to version 22.02. Were you able to solve this problem? I'm getting still same output as @morgolock even after running with LD_LIBRARY_PATH set.

Thanks!

morgolock commented 1 year ago

Hi @Mr-Franco

Please try this https://github.com/ARM-software/ComputeLibrary/issues/999#issuecomment-1260785962

It would be interesting to see what happens if you run arm_compute_validation

neps-smartfox commented 1 year ago

Hi @morgolock ,

Thanks for your reply. I haven't tried running arm_compute_validation. Is there a guide I could refer to to run it?

morgolock commented 1 year ago

Hi @neps-smartfox

You have to clone Arm Compute Library and build it along with the option validation_tests=1

Please see our documentation on how to build ACL: https://arm-software.github.io/ComputeLibrary/v22.08/how_to_build.xhtml

Once built you need to upload the binary to the device and just run: LD_LIBRARY_PATH=. ./arm_compute_validation

neps-smartfox commented 1 year ago

Hi morgolock,

Thanks for your reply, here is the result of the validation test. Since the log is too long, let me place first the first and last portion. If there is any information you need that is missing from the following, please let me know.

Version = arm_compute_version=v22.08 Build options: {'Werror': '1', 'debug': '1', 'neon': '1', 'opencl': '1', 'os': 'linux', 'arch': 'armv8a', 'build': 'cross_compile', 'validation_tests': '1'} Git hash=b'aabef6c0584f06f4c0f4b61fb787d80374240619'
CommandLine = ./tests/arm_compute_validation 
Seed = 2268930097
CL_DEVICE_VERSION = OpenCL 3.0 
cpu_has_sve = false
cpu_has_fp16 = false
cpu_has_bf16 = false
cpu_has_dotprod = false
cpu_has_svebf16 = false
CPU0 = A53
CPU1 = A53
CPU2 = A73
CPU3 = A73
CPU4 = A73
CPU5 = A73
Iterations = 1
Threads = 1
Dataset mode = PRECOMMIT
Running [0] 'UNIT/GPUTarget/GetGPUTargetFromName'
  Wall clock/Wall clock time:    AVG=1427.0000 us
Running [1] 'UNIT/GPUTarget/GPUTargetIsIn'
  Wall clock/Wall clock time:    AVG=1.0000 us

...

Running [1024] 'CPP/TopKV/QASYMM8'
  Wall clock/Wall clock time:    AVG=189.0000 us
Running [1025] 'CPP/TopKV/QASYMM8_SIGNED'
  Wall clock/Wall clock time:    AVG=220.0000 us
Running [1026] 'CL/UNIT/CompileContext/CompileContextCache'
ERROR: in generate_build_options src/core/CL/CLCompileContext.cpp:270: Non uniform workgroup size is not supported!!
Running [1027] 'CL/UNIT/DynamicTensor/DynamicTensorType3Single@Level0Shape=12,11,3:Level1Shape=67,31,15'
ERROR: in generate_build_options src/core/CL/CLCompileContext.cpp:270: Non uniform workgroup size is not supported!!
Runn
morgolock commented 1 year ago

Hi @neps-smartfox

Could you please share the output of the following commands:

root@acl-odroid-1:~/tmp/user# ldd clinfo 

And please use LD_DEBUG=libs to see which libOpenCLso file is loaded when running arm_compute_validation

root@acl-odroid-1:~/tmp/user# LD_DEBUG=libs LD_LIBRARY_PATH=./acl/2208/:$LD_LIBRARY_PATH acl/2208/arm_compute_validation --filter-id=152 | grep libOpenCl.so
 875091:    
    875091: 
    875091: calling init: /usr/lib/libOpenCL.so
    875091: 

Both clinfo and arm_compute_validation should load the same libOpenCL.so

Hope this helps

neps-smartfox commented 1 year ago

Hi @morgolock

Thank very you for your advice, it seems like acl is indeed using a different libopenCL.so

Originally, it seems to use the one located in /usr/lib/libOpenCL.so Then I changed the symbolic link to point to /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 The issue is no longer observed.

I'm just wondering if there is a better way to do this. Instead of changing the link, the can we instruct compute library can use the one from the correct directory.

morgolock commented 1 year ago

Hi @neps-smartfox

Good to know you solved the problem, it's a configuration issue on the device. The route to libOpenCL.so should be the same regardless where and how you run a binary like clinfo or arm_compute_validation.

In this case there seems to be two versions of the driver: one with support for the extension and the other without it.