OCL-dev / ocl-icd

OpenCL ICD Loader (free software)
BSD 2-Clause "Simplified" License
76 stars 23 forks source link

Support Apple's OpenCL.framework? #31

Open RJVB opened 1 year ago

RJVB commented 1 year ago

Would it be possible to make ocl-icd support Apple's OpenCL.framework somehow?

Or maybe it is supported already and I just do something wrong? With clinfo linked to ocl-icd's libOpenCL.dylib :

> env OCL_ICD_DEBUG=2  clinfo
ocl-icd(ocl_icd_loader.c:850): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/apple.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/System/Library/Frameworks/OpenCL.framework/OpenCL'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
Number of platforms                               0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

For comparison, here's the output when clinfo is linked to OpenCL.framework :

clinfo
Number of platforms                               1
  Platform Name                                   Apple
  Platform Vendor                                 Apple
  Platform Version                                OpenCL 1.2 (Nov  2 2015 15:02:14)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

  Platform Name                                   Apple
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
  Device Vendor                                   Intel
  Device Vendor ID                                0xffffffff
  Device Version                                  OpenCL 1.2 
  Driver Version                                  1.1
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               4
  Max clock frequency                             2700MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1x1
  Max work group size                             1024
  Preferred work group size multiple (kernel)     1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              17179869184 (16GiB)
  Error Correction support                        No
  Max memory allocation                           4294967296 (4GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        64
  Global Memory cache line size                   4194304 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   1 bytes
    Pitch alignment for 2D image buffers          1 pixels
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4096 (4KiB)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Apple
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [P0]
  clCreateContext(NULL, ...) [default]            Success [P0]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Apple
    Device Name                                   Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
Kerilk commented 1 year ago

Thanks for getting in touch. It most probably used to work at some point since I see it has been packaged for brew: https://formulae.brew.sh/formula/ocl-icd And we also test it in CI, and some of the test include loading stub installable client drivers through the ocl-icd. In your case it seems the the debug level is an issue, could you rerun with: env OCL_ICD_DEBUG=7 clinfo This should give use more info (7 = 1 (warn) + 2 (log) + 4 (trace))

RJVB commented 1 year ago

By your command 8)

> env OCL_ICD_DEBUG=7  clinfo
ocl-icd(ocl_icd_loader.c:850): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/apple.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/System/Library/Frameworks/OpenCL.framework/OpenCL'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735488984802/0x7fff88d406e2
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:295): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 0/0x0
ocl-icd(ocl_icd_loader.c:453): _find_and_check_platforms: Missing symbols in ICD, skipping it
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 0
ocl-icd(ocl_icd_loader.c:1138): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Number of platforms                               0
ocl-icd(ocl_icd_loader.c:1092): clGetExtensionFunctionAddress: Entering

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0
RJVB commented 1 year ago

Thanks for getting in touch. It most probably used to work at some point since I see it has been packaged for brew

Yes, that is also referenced in the clinfo README, but it doesn't exclude the possibility that this ever only supported using alternative OCL implementations!

RJVB commented 1 year ago

After trying

diff --git a/ocl_icd_loader.c b/ocl_icd_loader.c
index 12e0182..e45c35d 100644
--- a/ocl_icd_loader.c
+++ b/ocl_icd_loader.c
@@ -446,8 +446,13 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
     struct vendor_icd *picd = &_icds[i];
     void* dlh = _icds[i].dl_handle;
     picd->ext_fn_ptr = _get_function_addr(dlh, NULL, "clGetExtensionFunctionAddress");
+#if defined(__APPLE__) || defined(__MACOSX)
+    clIcdGetPlatformIDsKHR_fn plt_fn_ptr =
+      _get_function_addr(dlh, picd->ext_fn_ptr, "clGetPlatformIDs");
+#else
     clIcdGetPlatformIDsKHR_fn plt_fn_ptr =
       _get_function_addr(dlh, picd->ext_fn_ptr, "clIcdGetPlatformIDsKHR");
+#endif
     if( picd->ext_fn_ptr == NULL
        || plt_fn_ptr == NULL) {
       debug(D_WARN, "Missing symbols in ICD, skipping it");
> env OCL_ICD_DEBUG=7 clinfo
ocl-icd(ocl_icd_loader.c:855): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/apple.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/System/Library/Frameworks/OpenCL.framework/OpenCL'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735488984802/0x7fff88d406e2
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformIDs
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clGetPlatformIDs' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735489008937/0x7fff88d46529
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735489009106/0x7fff88d465d2
ocl-icd(ocl_icd_loader.c:498): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:315): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:505): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
ocl-icd(ocl_icd_loader.c:545): _find_and_check_platforms: Missing khr extension in platform 0, skipping it
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 0
ocl-icd(ocl_icd_loader.c:1143): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Number of platforms                               0
ocl-icd(ocl_icd_loader.c:1097): clGetExtensionFunctionAddress: Entering

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

Is my OpenCL implementation too old??

Also: why does the loader complain about a missing clGetPlatformIDs and clGetPlatformInfo when it clearly finds them?

Kerilk commented 1 year ago

What appears to be happening is that the OpenCL.framework library is not an installable client driver. It could be a loader in and of itself, in which case you would need to point the icd file to whatever this loader is loading. But most probably it means your OpenCL implementation is not supporting the cl_khr_icd extesion is the issue here. Platforms that support being loaded through an icd should advertise this extension, and I don't see it in the extension list from the clinfo above.

What you see about the driver is us trying to identify an installable driver that would incorrectly advertise itself. So we're trying to see if the objects can be cast to a pointer to dispatch table and assert functions inside could be used.

RJVB commented 1 year ago

Hrmf, and on top of that Apple dropped OpenCL support quite a few releases ago so there probably isn't much point pursuing this further.

I looked a bit more at HB and saw they package PoCL. Following their recipe I managed to build that myself but that doesn't seem to get me anything usable either:

ocl-icd(ocl_icd_loader.c:856): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/pocl.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/opt/local/lib/libpocl.2.12.0.dylib'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557573776/0x10fa70e90
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557575904/0x10fa716e0
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:295): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557429600/0x10fa4db60
ocl-icd(ocl_icd_loader.c:499): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:315): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:506): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_pocl_content_size
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:576): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 3.0 PoCL 4.1-pre main-0-g0f3e041c  Apple, MacPorts+Asserts, RELOC, SPIR, LLVM 16.0.6, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 1
ocl-icd(ocl_icd_loader.c:904): __initClIcd: 1 valid vendor(s)!
ocl-icd(ocl_icd_loader.c:716): __initSystemLayers: Reading lay list from '/opt/local/etc/OpenCL/layers'
ocl-icd(ocl_icd_loader.c:727): __initSystemLayers: Reading lay list from '/opt/local/etc/OpenCL/layers'
ocl-icd(ocl_icd_loader.c:206): _find_num_lays: return: 0/0x0
Number of platforms                               1
ocl-icd(ocl_icd_loader.c:1140): clGetPlatformIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Name                                   Portable Computing Language
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Vendor                                 The pocl project
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Version                                OpenCL 3.0 PoCL 4.1-pre main-0-g0f3e041c  Apple, MacPorts+Asserts, RELOC, SPIR, LLVM 16.0.6, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Profile                                FULL_PROFILE
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Extensions with Version                cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_pocl_content_size                                             0x400000 (1.0.0)
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Numeric Version                        0xc00000 (3.0.0)
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Extensions function suffix             POCL
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
  Platform Host timer resolution                  0ns
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff

  Platform Name                                   Portable Computing Language
Number of devices                                 0

Seems clGetPlatformInfo always returns NULL. Which is probably correct given how it's absolutely unclear what underlying driver pocl would use on Mac...

RJVB commented 1 year ago

Bummer, I thought I had found the wrapper library I needed, but it seems not to work anymore (probably because it's even older than my OS, haha):

https://github.com/jrprice/ocl_icd_wrapper/issues/1

RJVB commented 1 year ago

I think I'm onto something.

From an lldb session:

    frame #0: 0x0000000100096bb1 libocl_icd_wrapper.dylib`clIcdGetPlatformIDsKHR(num_entries=1, platforms=0x000000010010ff60, num_platforms=0x0000000000000000) at ocl_icd_wrapper.c:53
   50       platforms[0] = m_platform;
   51     }
   52   
-> 53     if (num_platforms)
   54     {
   55       *num_platforms = 1;
   56     }
(lldb) p *platforms[0]
(_cl_platform_id) $27 = {
  dispatch = 0x0000000100110790
  platform = 0x000000007fff0000
}

and a bit later, after returning from that query:

* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x000000010002ed19 libOpenCL.1.dylib`_find_and_check_platforms(num_icds=2) at ocl_icd_loader.c:497
   494        debug(D_WARN, "Error in loading ICD platforms, skipping ICD");
   495        continue;
   496      }
-> 497      cl_uint num_valid_platforms=0;
   498      cl_uint j;
   499      debug(D_LOG, "Try to load %d platforms", num_platforms);
   500      if (_allocate_platforms(num_platforms) < num_platforms) {
(lldb) p *platforms[0]
(_cl_platform_id) $28 = {
  dispatch = 0x0000000100110790
}

So we're dealing with two different definitions of _cl_platform_id here, and when ocl-icd does p->pid=platforms[j]; it is actually setting the platform ID to the address of the dispatch table. No wonder that I then get an invalid platform error...

Kerilk commented 1 year ago

There are many thing going on here, and I am going to try and answer some of it, but no in the order you presented them:

So we're dealing with two different definitions of _cl_platform_id here, and when ocl-icd does p->pid=platforms[j]; it is actually setting the platform ID to the address of the dispatch table. No wonder that I then get an invalid platform error...

Indeed, but your last statement is not correct, p->pid=platforms[j]; does correctly have p->pid point to the platform address (the structure allocated in the wrapper) which contains the pointer to the dispatch table.

Maybe we rejected it because the platform does not advertise the cl_icd_khr extension. Have you tried setting OCL_ICD_ASSUME_ICD_EXTENSION=debug this should bypass this check (here again, a full log would be helpful).

On a side note, the wrapper is a proof of concept, and it should not be used in any serious setup. It is flawed in at least two ways which I can think of:

For the POCL log you posted above:

ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Numeric Version 0xc00000 (3.0.0) ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Extensions function suffix POCL ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Host timer resolution 0ns ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff

This indicates POCL working fine. clGetPlatformInfo is returning CL_SUCCESS (0) while clGetDeviceIDs returns CL_DEVICE_NOT_FOUND (-1). So your assumption is correct. I don't know if pocl supports a CPU device on you platform, but this is something you could look into. (The CI script here seems to indicate it should: https://github.com/pocl/pocl/blob/main/.github/workflows/build_cmake_macos.yml)

RJVB commented 1 year ago

pocl works, and provides a more recent implementation than Apple's framework. My main interest here is to be able to compare the two implementations.

I checked ocl_icd_wrapper for memory leaks, and fixed the ones I noticed (they're leak on failure). I'm not certain what you mean with Callbacks will return unexpected handles that won't be able to be used in OpenCL APIs or even to identify the object that the callback was attached to. I did see that a few context creation functions allocate a wrapper around the actual context that they obtained from the underlying OpenCL function, but unless I overlooked a subtle difference in structure naming those context structures provide a way to wrap/embed/chain them.

Now, what's going on here (Pt. 1!) is indeed that clGetPlatformInfo gets called with ocl-icd's idea of a platform handle: the dispatch table. Apple's function wants a pointer to the actual platform instead (always 0x7fff0000 on my system). I created a simple additional wrapper function that calls the existing clIcdGetPlatformIDsKHR() function, and added the following snippet to _find_and_check_platforms() that allows me to determine if clGetPlatformInfo() should be obtained from the dispatch table:

#ifdef __APPLE__
    int isAppleCL = false;
    {
      Dl_info info;
      // check what library provides clGetPlatformInfo():
      if (dladdr(plt_info_ptr, &info) && info.dli_fname) {
        isAppleCL = strstr(info.dli_fname, "/System/Library/Frameworks/OpenCL.framework") != NULL;
      }
    }
#endif
    cl_uint num_platforms=0;

And with that I'm at Pt 2: cl_khr_icd is (still) missing from the reported extensions. I modified the clGetPlatformInfo() wrapper to append that string, and now clinfo gives me the same (or at least very similar ;)) output as it gives when linked directly to OpenCL.framework.

RJVB commented 1 year ago

And a number of the examples shipped with pocl run with/against Apple's implementation. Those that do actually seem to run a lot faster. Those that don't often fail with a CL_INVALID_WORK_GROUP_SIZE error in clEnqueueNDRangeKernel.

Kerilk commented 1 year ago

First, great progress.

I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the clReleaseXXXXX wrapper functions, none of them actually free the wrapper struct, they just forward the wrapped object to the underlying implementation.

For callbacks, like the one used in clSetEventCallback, the callabck is expected to be called with the cl_event handle the user used to register the callback. Given how the wrapper library works, the user will be provided the handle from the underlying implementation, and not the handle the ocl_icd_wrapper returned, since the underlying implementation has no knowledge of this handle. Fixing this could be achieved with closures, but this is hard.

For Pt1 and 2, not sure I am following you here. I would have assumed patching ocl_icd_wrapper's _clGetPlatformInfo_ function, here: https://github.com/jrprice/ocl_icd_wrapper/blob/3b9490ede352a57575d7c29a92a184b0622b7fe3/ocl_icd_wrapper.c#L74-L109 to append the cl_icd_khr would have sufficed (with maybe setting the aforementioned OCL_ICD_ASSUME_ICD_EXTENSION environment variable).

ocl_icd, should, through the dispatch table if the clGetPlatformInfo is not exported by the implementation, search for the cl_icd_khr inside the list of supported extensions.

You can also bypass looking through the dispatch table (and the OCL_ICD_ASSUME_ICD_EXTENSION check) by adding a simple wrapper function to the ocl_icd_wrapper, just after the _clGetPlatformInfo_ function:

CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformInfo(cl_platform_id    platform,
                    cl_platform_info  param_name,
                    size_t            param_value_size,
                    void *            param_value,
                    size_t *          param_value_size_ret) CL_API_SUFFIX__VERSION_1_0 
{
  return _clGetPlatformInfo_(platform, param_name, param_value_size, param_value, param_value_size_ret);
}

the _clGetPlatformInfo_ would still need to append cl_icd_khr to the list of supported extensions.

I hope this helps.

Kerilk commented 1 year ago

And a number of the examples shipped with pocl run with/against Apple's implementation. Those that do actually seem to run a lot faster. Those that don't often fail with a CL_INVALID_WORK_GROUP_SIZE error in clEnqueueNDRangeKernel.

Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU. There are talks of implementing a pocl backend over Metal (see here: https://github.com/CHIP-SPV/chipStar/pull/602#issuecomment-1709377251), but I wouldn't hold my breath.

RJVB commented 1 year ago

Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU.

It claims to be CPU-only, and that's also what I read in every discussion of it. But this software was developed when Apple still had a vested interest in scientific and computing applications and were (from what I understood) pretty good at it. Plus they can probably be much closer to the hardware than a portable implementation can be. I also wouldn't be surprised if CPU-only implementations didn't exactly get faster while they evolved to newer OCL versions (if you only look at the performance of older functionality on the same hardware, that is).

RJVB commented 1 year ago

I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the clReleaseXXXXX wrapper functions, none of them actually free the wrapper struct, they just forward the wrapped object to the underlying implementation.

N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is CL_CONTEXT_REFERENCE_COUNT sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)

For callbacks, like the one used in clSetEventCallback, the callabck is expected to be called with the cl_event handle the user used to register the callback. Given how the wrapper library works, the user will be provided the handle from the underlying implementation, and not the handle the ocl_icd_wrapper returned, since the underlying implementation has no knowledge of this handle. Fixing this could be achieved with closures, but this is hard.

A closure, like a lambda function? I suppose you'd have to write a wrapper callback function (or callback wrapper function?) that figures out what the user-provided handle is and then calls the user-provided callback with that handle? Could that be done without resorting to lookup tables? That does sound like a harder problem than fixing the retain/release mechanism so the temporary alternative could be just to raise an error when an attempt is made to use event callbacks. But then maybe that would render the entire endeavour pointless (= I am on too unfamiliar grounds to assess how necessary and common use of such callbacks is).

For Pt1 and 2, not sure I am following you here.

I realise I wasn't very clear - part of the work I did was done instead of sleeping and now I don't even remember exactly why I added a wrapper for clGetPlatformIDs for instance ;) Remember that CL_INVALID_PLATFORM error in the logs above? That came from clGetPlatformInfo because your code retrieves the pointer to that function dynamically and ocd_icd_wrapper is not a traditional wrapper library that overloads system functions. So you were calling into Apple's function which of course can't make any sense of the platform pointer (aka the dispatch table) it gets from you. Hence my patch to ocl-icd:

diff --git a/ocl_icd_loader.c b/ocl_icd_loader.c
index 12e0182..0f84d03 100644
--- a/ocl_icd_loader.c
+++ b/ocl_icd_loader.c
@@ -474,6 +474,16 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
                    break;
            }
     }
+#ifdef __APPLE__
+    int isAppleCL = false;
+    {
+      Dl_info info;
+      // check what library provides clGetPlatformInfo():
+      if (dladdr(plt_info_ptr, &info) && info.dli_fname) {
+        isAppleCL = strstr(info.dli_fname, "/System/Library/Frameworks/OpenCL.framework") != NULL;
+      }
+    }
+#endif
     cl_uint num_platforms=0;
     cl_int error;
     error = (*plt_fn_ptr)(0, NULL, &num_platforms);
@@ -504,12 +514,22 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
       p->vicd=&_icds[i];
       p->pid=platforms[j];

+#ifdef __APPLE__
+      // If we're trying to work with Apple's OpenCL framework we'll need
+      // to get a wrapped clGetPlatformInfo() function because Apple's version
+      // will expect a pointer to the actual platform instead of to the dispatch
+      // table (on my system that pointer is always 0x7fff0000).
+      if (isAppleCL)
+#else
       /* If clGetPlatformInfo is not exported and we are here, it
        * means that OCL_ICD_ASSUME_ICD_EXTENSION. Si we try to take it
        * from the dispatch * table. If that fails too, we have to
        * bail.
        */
-      if (plt_info_ptr == NULL) {
+      if (plt_info_ptr == NULL)
+#endif
+        {
+        debug(D_LOG, "Getting clGetPlatformInfo from dispatch table!");
         plt_info_ptr = p->pid->dispatch->clGetPlatformInfo;
         if (plt_info_ptr == NULL) {
           debug(D_WARN, "Missing clGetPlatformInfo even in ICD dispatch table, skipping it");

(I'm quite content I remembered that Apple has dl_addr and that you already checked for strstr which kept this change simple!) It would be nicer too have a more elegant check that also works when the framework is not in its official location. Checking just for OpenCL.framework is too ambiguous: I've already added some logic to my ocl-icd install recipe to create an additional installation as a framework, called ... OpenCL.framework . That in itself gives a usable way to chose between say Apple's and the POCL implementation by linking against the one OpenCL framework and then using DYLD_FRAMEWORK_PATH to point dyld to the other. But only in applications that don't hardcode the OpenCL framework location, like e.g. opencv does ...

I also patch the wrapped implementation: https://github.com/RJVB/ocl_icd_wrapper/blob/5d0da6cc2d5cb52cb9b80e0efb1891da67f366f2/ocl_icd_wrapper.c#L118

Kerilk commented 1 year ago

Remember that CL_INVALID_PLATFORM error in the logs above? That came from clGetPlatformInfo because your code retrieves the pointer to that function dynamically and ocd_icd_wrapper is not a traditional wrapper library that overloads system functions. So you were calling into Apple's function which of course can't make any sense of the platform pointer (aka the dispatch table) it gets from you. Hence my patch to ocl-icd:

This is not what ocl_icd is doing here. It is calling the wrapper function ocl_icd_wrapper put in it's dispatch table:

#define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_;

notice how the macro appends the underscores to the function name. The wrapper is linked against the apple OpenCL library and these functions are "embedded" inside the wrappers, and not contained into a table. I submitted a PR to your repo that should avoid the whole issue and allow not modifying ocl-icd at all: https://github.com/RJVB/ocl_icd_wrapper/pull/1/files

Let me know how it goes. If something is not working after that, it must be because we have a bug in ocl-icd that needs to be fixed.

N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is CL_CONTEXT_REFERENCE_COUNT sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)

Yes and yes :) . I would use atomics for such a task, they usually prove better at these kind of jobs.

A closure, like a lambda function? I suppose you'd have to write a wrapper callback function (or callback wrapper function?) that figures out what the user-provided handle is and then calls the user-provided callback with that handle? Could that be done without resorting to lookup tables? That does sound like a harder problem than fixing the retain/release mechanism so the temporary alternative could be just to raise an error when an attempt is made to use event callbacks. But then maybe that would render the entire endeavour pointless (= I am on too unfamiliar grounds to assess how necessary and common use of such callbacks is).

Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released.

RJVB commented 1 year ago

On Tuesday September 19 2023 08:05:46 Brice Videau wrote:

This is not what ocl_icd is doing here. It is calling the wrapper function ocl_icd_wrapper put in it's dispatch table:


#define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_;

Where exactly?

For clGetPlatformInfo it first tries to load it dynamically from the library: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L457

That call will not fail with OpenCL.framework, and will give you a pointer to Apple's version that will not know what to do with a pointer to the dispatch table instead of whatever 0x7fff0000 points to.

I submitted a PR to your repo that should avoid the whole issue and allow not modifying ocl-icd at all: https://github.com/RJVB/ocl_icd_wrapper/pull/1/files

I'll have a look later today and let you know, but I don't see how it would get around the fact that ocl-icd will only use the dispatch table for clGetPlatformInfo if that dynamic lookup failed.

I assume it's written that way because not intended to be used with implementations that aren't ICD-compatible, IOW the assumption is probably that the function found via dlsym will accept the platform_id handle it will be called with. I had a quick look at Mesa's OpenCL implementation, which can be built as "standalone" and as ICD-compatible, to see if they handle the 2 situations differently. Of course they don't (AFAICT), but then again there probably is no reason. I guess dependent code is not supposed to know anything about the platform_id handle so it can represent anything the implementation wants.

N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is CL_CONTEXT_REFERENCE_COUNT sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)

Yes and yes :)

There are 5 statements in my text above (incl. 2 explicit questions) to which you could be responding with "yes", so I have to ask you indeed meant the 2 explicit questions ;)

. I would use atomics for such a task, they usually prove better at these kind of jobs.

Faster, undoubtedly. Use them as a semaphore you mean, something like if (--atomicBarrier == 0) { "do our thing"; atomicBarrier += 1; } ? If you did mean that "yes, it's possible to retain/release the wrapper objects" (using the underlying clRetain/Release functions!), are those locks are needed only to ensure that the 2 refcounts remain in sync, right?

Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released.

Isn't that the same thing I thought up - you'd still need to install an intermediate callback that calls the intended callback with the data from that small structure, no? I thought of putting that small structure in the redefined cl_event structure in such a way a pointer to it can be returned as a cl_event*, something like

struct cl_event_wrapper {
  struct cl_event theEvent;
  struct wrapData {
      // whatever you need
  }
};

But that's dangerous . So then what, an internal lookup table that stores all those "small structures" as with the cl_event wrapper object as lookup key, queried when releasing that wrapper object?

Kerilk commented 1 year ago

Where exactly? For clGetPlatformInfo it first tries to load it dynamically from the library: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L457 That call will not fail with OpenCL.framework, and will give you a pointer to Apple's version that will not know what to do with a pointer to the dispatch table instead of whatever 0x7fff0000 points to.

dlsym should be querying pointers from the wrapper library, not the OpenCL framework. For now this symbol cannot be found in the wrapper library, so the loader will try querying it through clGetExtensionFunctionAddress. This may have been an oversight, or a maybe way to run around a buggy implementation.

In practice, clGetExtensionFunctionAddress should not return an address for clGetPlatformInfo. If it does on apple, this is concerning but not something that cannot be fixed inside the wrapper, by ensuring we are not querying the implementation for clGetPlatformInfo. I will update my PR accordingly.

I'll try to answer the ref counting questions in more details later, when I have more time.

Edit: I see the wrapper library clGetExtensionFunctionAddress cannot return a pointer for clGetPlatformInfo so we should be fine here. See here for the code: https://github.com/Kerilk/ocl_icd_wrapper/blob/5d0da6cc2d5cb52cb9b80e0efb1891da67f366f2/ocl_icd_wrapper.c#L61-L72

Edit 2: see here for the specification about the ICD library (the wrapper in this case) having to provide the 3 entry points: https://github.com/KhronosGroup/OpenCL-Docs/blob/6ca05e99cb7e0c23dca7efe073b6dc9367ebbf1d/ext/cl_khr_icd.asciidoc?plain=1#L209-L215

Upon successfully loading a Vendor ICD's library, the ICD Loader queries the following functions from the library: {clIcdGetPlatformIDsKHR}, {clGetPlatformInfo}, and {clGetExtensionFunctionAddress} (note: {clGetExtensionFunctionAddress} has been deprecated, but is still required for the ICD Loader). If any of these functions are not present then the ICD Loader will close and ignore the library.

This is what my patch to the wrapper library does, implement the third missing symbol.

Edit 3: OK I get why you're getting the Apple symbol from the dlsym, it is getting grabbed because the Apple library is linked into the wrapper. It took me some time to get there. Of course my current PR is broken, since it will call itself rather than the underlying Apple implementation. I will have the get extension function to return the clGetPlatformInfo pointer which should override the one obtained through dlsym, as can be seen here: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L290-L311

Edit 4: The real clean solution to the above issue would be having the Apple library be loaded at initialization via dlopen in the wrapper library, and it's symbol queried by dlsym. Pointers would be used inside the functions in the wrapper. This would ensure symbols from the Appple library cannot bleed into the loader.

RJVB commented 1 year ago

dlsym should be querying pointers from the wrapper library, not the OpenCL framework.

AFAIK OS X works as a general Unix in this aspect. If you query a shared library for a symbol that it uses but obtains from another library, you will get the address for that symbol (barring visibility tricks) but dl_addr will tell you that it comes from that other library. You can check with my wrapper around dlsym: https://github.com/RJVB/legacy-tools/blob/master/dlsym.c .

Avoiding this is possible, but you'd have to overload the function. That is, ocd_icd_wrapper would have to have a function called clGetPlatformInfo, and use an init function to obtain the pointer to Apple's implementation.

R.

Kerilk commented 1 year ago

Indeed, this is my understanding as well. It just took me way too long to make the connection.

Avoiding this is possible, but you'd have to overload the function. That is, ocd_icd_wrapper would have to have a function called clGetPlatformInfo, and use an init function to obtain the pointer to Apple's implementation.

This is indeed the "correct" solution I was proposing above (edit 4).

I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications.

RJVB commented 1 year ago

Indeed, this is my understanding as well. It just took me way too long to make the connection.

Lol, I also spent way too long trying to figure out another kind of sudden, "inexplicable" behaviour the other day. Until I realised I had linked the wrapper library to my framework version of ocd-icl ;)

This is indeed the "correct" solution I was proposing above (edit 4).

I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications.

I just found the time for a quick look, saw it in one of the commits but somehow not showing up in what I take to be the combined patch; I hope to get around to testing it today.