Open RJVB opened 1 year ago
Thanks for getting in touch. It most probably used to work at some point since I see it has been packaged for brew:
https://formulae.brew.sh/formula/ocl-icd
And we also test it in CI, and some of the test include loading stub installable client drivers through the ocl-icd.
In your case it seems the the debug level is an issue, could you rerun with:
env OCL_ICD_DEBUG=7 clinfo
This should give use more info (7 = 1 (warn) + 2 (log) + 4 (trace))
By your command 8)
> env OCL_ICD_DEBUG=7 clinfo
ocl-icd(ocl_icd_loader.c:850): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/apple.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/System/Library/Frameworks/OpenCL.framework/OpenCL'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735488984802/0x7fff88d406e2
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:295): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 0/0x0
ocl-icd(ocl_icd_loader.c:453): _find_and_check_platforms: Missing symbols in ICD, skipping it
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 0
ocl-icd(ocl_icd_loader.c:1138): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Number of platforms 0
ocl-icd(ocl_icd_loader.c:1092): clGetExtensionFunctionAddress: Entering
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.2
ICD loader Profile OpenCL 3.0
Thanks for getting in touch. It most probably used to work at some point since I see it has been packaged for brew
Yes, that is also referenced in the clinfo
README, but it doesn't exclude the possibility that this ever only supported using alternative OCL implementations!
After trying
diff --git a/ocl_icd_loader.c b/ocl_icd_loader.c
index 12e0182..e45c35d 100644
--- a/ocl_icd_loader.c
+++ b/ocl_icd_loader.c
@@ -446,8 +446,13 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
struct vendor_icd *picd = &_icds[i];
void* dlh = _icds[i].dl_handle;
picd->ext_fn_ptr = _get_function_addr(dlh, NULL, "clGetExtensionFunctionAddress");
+#if defined(__APPLE__) || defined(__MACOSX)
+ clIcdGetPlatformIDsKHR_fn plt_fn_ptr =
+ _get_function_addr(dlh, picd->ext_fn_ptr, "clGetPlatformIDs");
+#else
clIcdGetPlatformIDsKHR_fn plt_fn_ptr =
_get_function_addr(dlh, picd->ext_fn_ptr, "clIcdGetPlatformIDsKHR");
+#endif
if( picd->ext_fn_ptr == NULL
|| plt_fn_ptr == NULL) {
debug(D_WARN, "Missing symbols in ICD, skipping it");
> env OCL_ICD_DEBUG=7 clinfo
ocl-icd(ocl_icd_loader.c:855): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/apple.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/System/Library/Frameworks/OpenCL.framework/OpenCL'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735488984802/0x7fff88d406e2
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformIDs
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clGetPlatformIDs' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735489008937/0x7fff88d46529
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:301): _get_function_addr: Missing function 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 140735489009106/0x7fff88d465d2
ocl-icd(ocl_icd_loader.c:498): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:315): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:505): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
ocl-icd(ocl_icd_loader.c:545): _find_and_check_platforms: Missing khr extension in platform 0, skipping it
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 0
ocl-icd(ocl_icd_loader.c:1143): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Number of platforms 0
ocl-icd(ocl_icd_loader.c:1097): clGetExtensionFunctionAddress: Entering
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.2
ICD loader Profile OpenCL 3.0
Is my OpenCL implementation too old??
Also: why does the loader complain about a missing clGetPlatformIDs
and clGetPlatformInfo
when it clearly finds them?
What appears to be happening is that the OpenCL.framework library is not an installable client driver. It could be a loader in and of itself, in which case you would need to point the icd file to whatever this loader is loading. But most probably it means your OpenCL implementation is not supporting the cl_khr_icd
extesion is the issue here. Platforms that support being loaded through an icd should advertise this extension, and I don't see it in the extension list from the clinfo above.
What you see about the driver is us trying to identify an installable driver that would incorrectly advertise itself. So we're trying to see if the objects can be cast to a pointer to dispatch table and assert functions inside could be used.
Hrmf, and on top of that Apple dropped OpenCL support quite a few releases ago so there probably isn't much point pursuing this further.
I looked a bit more at HB and saw they package PoCL. Following their recipe I managed to build that myself but that doesn't seem to get me anything usable either:
ocl-icd(ocl_icd_loader.c:856): __initClIcd: Reading icd list from '/opt/local/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:242): _open_driver: Considering file '/opt/local/etc/OpenCL/vendors/pocl.icd'
ocl-icd(ocl_icd_loader.c:216): _load_icd: Loading ICD '/opt/local/lib/libpocl.2.12.0.dylib'
ocl-icd(ocl_icd_loader.c:220): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:287): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:444): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557573776/0x10fa70e90
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557575904/0x10fa716e0
ocl-icd(ocl_icd_loader.c:292): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:295): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:310): _get_function_addr: return: 4557429600/0x10fa4db60
ocl-icd(ocl_icd_loader.c:499): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:315): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:506): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_pocl_content_size
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:576): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 3.0 PoCL 4.1-pre main-0-g0f3e041c Apple, MacPorts+Asserts, RELOC, SPIR, LLVM 16.0.6, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Nb platefroms: 1
ocl-icd(ocl_icd_loader.c:904): __initClIcd: 1 valid vendor(s)!
ocl-icd(ocl_icd_loader.c:716): __initSystemLayers: Reading lay list from '/opt/local/etc/OpenCL/layers'
ocl-icd(ocl_icd_loader.c:727): __initSystemLayers: Reading lay list from '/opt/local/etc/OpenCL/layers'
ocl-icd(ocl_icd_loader.c:206): _find_num_lays: return: 0/0x0
Number of platforms 1
ocl-icd(ocl_icd_loader.c:1140): clGetPlatformIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Name Portable Computing Language
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Vendor The pocl project
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Version OpenCL 3.0 PoCL 4.1-pre main-0-g0f3e041c Apple, MacPorts+Asserts, RELOC, SPIR, LLVM 16.0.6, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Profile FULL_PROFILE
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Extensions cl_khr_icd cl_pocl_content_size
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Numeric Version 0xc00000 (3.0.0)
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Extensions function suffix POCL
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
Platform Host timer resolution 0ns
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff
Platform Name Portable Computing Language
Number of devices 0
Seems clGetPlatformInfo
always returns NULL. Which is probably correct given how it's absolutely unclear what underlying driver pocl would use on Mac...
Bummer, I thought I had found the wrapper library I needed, but it seems not to work anymore (probably because it's even older than my OS, haha):
I think I'm onto something.
From an lldb session:
frame #0: 0x0000000100096bb1 libocl_icd_wrapper.dylib`clIcdGetPlatformIDsKHR(num_entries=1, platforms=0x000000010010ff60, num_platforms=0x0000000000000000) at ocl_icd_wrapper.c:53
50 platforms[0] = m_platform;
51 }
52
-> 53 if (num_platforms)
54 {
55 *num_platforms = 1;
56 }
(lldb) p *platforms[0]
(_cl_platform_id) $27 = {
dispatch = 0x0000000100110790
platform = 0x000000007fff0000
}
and a bit later, after returning from that query:
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x000000010002ed19 libOpenCL.1.dylib`_find_and_check_platforms(num_icds=2) at ocl_icd_loader.c:497
494 debug(D_WARN, "Error in loading ICD platforms, skipping ICD");
495 continue;
496 }
-> 497 cl_uint num_valid_platforms=0;
498 cl_uint j;
499 debug(D_LOG, "Try to load %d platforms", num_platforms);
500 if (_allocate_platforms(num_platforms) < num_platforms) {
(lldb) p *platforms[0]
(_cl_platform_id) $28 = {
dispatch = 0x0000000100110790
}
So we're dealing with two different definitions of _cl_platform_id
here, and when ocl-icd does p->pid=platforms[j];
it is actually setting the platform ID to the address of the dispatch table. No wonder that I then get an invalid platform error...
There are many thing going on here, and I am going to try and answer some of it, but no in the order you presented them:
So we're dealing with two different definitions of
_cl_platform_id
here, and when ocl-icd doesp->pid=platforms[j];
it is actually setting the platform ID to the address of the dispatch table. No wonder that I then get an invalid platform error...
Indeed, but your last statement is not correct, p->pid=platforms[j];
does correctly have p->pid
point to the platform address (the structure allocated in the wrapper) which contains the pointer to the dispatch table.
Maybe we rejected it because the platform does not advertise the cl_icd_khr
extension.
Have you tried setting OCL_ICD_ASSUME_ICD_EXTENSION=debug
this should bypass this check (here again, a full log would be helpful).
On a side note, the wrapper is a proof of concept, and it should not be used in any serious setup. It is flawed in at least two ways which I can think of:
For the POCL log you posted above:
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Numeric Version 0xc00000 (3.0.0) ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Extensions function suffix POCL ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 Platform Host timer resolution 0ns ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff
This indicates POCL working fine. clGetPlatformInfo
is returning CL_SUCCESS
(0) while clGetDeviceIDs
returns CL_DEVICE_NOT_FOUND
(-1). So your assumption is correct. I don't know if pocl supports a CPU device on you platform, but this is something you could look into. (The CI script here seems to indicate it should: https://github.com/pocl/pocl/blob/main/.github/workflows/build_cmake_macos.yml)
pocl works, and provides a more recent implementation than Apple's framework. My main interest here is to be able to compare the two implementations.
I checked ocl_icd_wrapper
for memory leaks, and fixed the ones I noticed (they're leak on failure). I'm not certain what you mean with Callbacks will return unexpected handles that won't be able to be used in OpenCL APIs or even to identify the object that the callback was attached to
. I did see that a few context creation functions allocate a wrapper around the actual context that they obtained from the underlying OpenCL function, but unless I overlooked a subtle difference in structure naming those context structures provide a way to wrap/embed/chain them.
Now, what's going on here (Pt. 1!) is indeed that clGetPlatformInfo
gets called with ocl-icd
's idea of a platform handle: the dispatch table. Apple's function wants a pointer to the actual platform instead (always 0x7fff0000 on my system). I created a simple additional wrapper function that calls the existing clIcdGetPlatformIDsKHR()
function, and added the following snippet to _find_and_check_platforms()
that allows me to determine if clGetPlatformInfo()
should be obtained from the dispatch table:
#ifdef __APPLE__
int isAppleCL = false;
{
Dl_info info;
// check what library provides clGetPlatformInfo():
if (dladdr(plt_info_ptr, &info) && info.dli_fname) {
isAppleCL = strstr(info.dli_fname, "/System/Library/Frameworks/OpenCL.framework") != NULL;
}
}
#endif
cl_uint num_platforms=0;
And with that I'm at Pt 2: cl_khr_icd
is (still) missing from the reported extensions. I modified the clGetPlatformInfo()
wrapper to append that string, and now clinfo gives me the same (or at least very similar ;)) output as it gives when linked directly to OpenCL.framework.
And a number of the examples shipped with pocl
run with/against Apple's implementation. Those that do actually seem to run a lot faster. Those that don't often fail with a CL_INVALID_WORK_GROUP_SIZE
error in clEnqueueNDRangeKernel
.
First, great progress.
I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the clReleaseXXXXX
wrapper functions, none of them actually free the wrapper struct, they just forward the wrapped object to the underlying implementation.
For callbacks, like the one used in clSetEventCallback
, the callabck is expected to be called with the cl_event
handle the user used to register the callback. Given how the wrapper library works, the user will be provided the handle from the underlying implementation, and not the handle the ocl_icd_wrapper
returned, since the underlying implementation has no knowledge of this handle. Fixing this could be achieved with closures, but this is hard.
For Pt1 and 2, not sure I am following you here. I would have assumed patching ocl_icd_wrapper
's _clGetPlatformInfo_
function, here:
https://github.com/jrprice/ocl_icd_wrapper/blob/3b9490ede352a57575d7c29a92a184b0622b7fe3/ocl_icd_wrapper.c#L74-L109
to append the cl_icd_khr
would have sufficed (with maybe setting the aforementioned OCL_ICD_ASSUME_ICD_EXTENSION
environment variable).
ocl_icd
, should, through the dispatch table if the clGetPlatformInfo
is not exported by the implementation, search for the cl_icd_khr
inside the list of supported extensions.
You can also bypass looking through the dispatch table (and the OCL_ICD_ASSUME_ICD_EXTENSION
check) by adding a simple wrapper function to the ocl_icd_wrapper
, just after the _clGetPlatformInfo_
function:
CL_API_ENTRY cl_int CL_API_CALL
clGetPlatformInfo(cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0
{
return _clGetPlatformInfo_(platform, param_name, param_value_size, param_value, param_value_size_ret);
}
the _clGetPlatformInfo_
would still need to append cl_icd_khr
to the list of supported extensions.
I hope this helps.
And a number of the examples shipped with
pocl
run with/against Apple's implementation. Those that do actually seem to run a lot faster. Those that don't often fail with aCL_INVALID_WORK_GROUP_SIZE
error inclEnqueueNDRangeKernel
.
Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU. There are talks of implementing a pocl backend over Metal (see here: https://github.com/CHIP-SPV/chipStar/pull/602#issuecomment-1709377251), but I wouldn't hold my breath.
Apple implementation most probably leverages the GPU of your computer, while pocl can only use the CPU.
It claims to be CPU-only, and that's also what I read in every discussion of it. But this software was developed when Apple still had a vested interest in scientific and computing applications and were (from what I understood) pretty good at it. Plus they can probably be much closer to the hardware than a portable implementation can be. I also wouldn't be surprised if CPU-only implementations didn't exactly get faster while they evolved to newer OCL versions (if you only look at the performance of older functionality on the same hardware, that is).
I don't think you can reliably fix the memory leaks without careful reference counting of the wrapper objects. If you look at the
clReleaseXXXXX
wrapper functions, none of them actually free the wrapper struct, they just forward the wrapped object to the underlying implementation.
N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is CL_CONTEXT_REFERENCE_COUNT
sufficiently reliable to determine when the wrapper object has to be freed?
Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe.
There are enough of those to make that a "nice" additional bit of work :)
For callbacks, like the one used in
clSetEventCallback
, the callabck is expected to be called with thecl_event
handle the user used to register the callback. Given how the wrapper library works, the user will be provided the handle from the underlying implementation, and not the handle theocl_icd_wrapper
returned, since the underlying implementation has no knowledge of this handle. Fixing this could be achieved with closures, but this is hard.
A closure, like a lambda function? I suppose you'd have to write a wrapper callback function (or callback wrapper function?) that figures out what the user-provided handle is and then calls the user-provided callback with that handle? Could that be done without resorting to lookup tables? That does sound like a harder problem than fixing the retain/release mechanism so the temporary alternative could be just to raise an error when an attempt is made to use event callbacks. But then maybe that would render the entire endeavour pointless (= I am on too unfamiliar grounds to assess how necessary and common use of such callbacks is).
For Pt1 and 2, not sure I am following you here.
I realise I wasn't very clear - part of the work I did was done instead of sleeping and now I don't even remember exactly why I added a wrapper for clGetPlatformIDs
for instance ;)
Remember that CL_INVALID_PLATFORM error in the logs above? That came from clGetPlatformInfo
because your code retrieves the pointer to that function dynamically and ocd_icd_wrapper
is not a traditional wrapper library that overloads system functions. So you were calling into Apple's function which of course can't make any sense of the platform pointer (aka the dispatch table) it gets from you. Hence my patch to ocl-icd
:
diff --git a/ocl_icd_loader.c b/ocl_icd_loader.c
index 12e0182..0f84d03 100644
--- a/ocl_icd_loader.c
+++ b/ocl_icd_loader.c
@@ -474,6 +474,16 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
break;
}
}
+#ifdef __APPLE__
+ int isAppleCL = false;
+ {
+ Dl_info info;
+ // check what library provides clGetPlatformInfo():
+ if (dladdr(plt_info_ptr, &info) && info.dli_fname) {
+ isAppleCL = strstr(info.dli_fname, "/System/Library/Frameworks/OpenCL.framework") != NULL;
+ }
+ }
+#endif
cl_uint num_platforms=0;
cl_int error;
error = (*plt_fn_ptr)(0, NULL, &num_platforms);
@@ -504,12 +514,22 @@ static inline void _find_and_check_platforms(cl_uint num_icds) {
p->vicd=&_icds[i];
p->pid=platforms[j];
+#ifdef __APPLE__
+ // If we're trying to work with Apple's OpenCL framework we'll need
+ // to get a wrapped clGetPlatformInfo() function because Apple's version
+ // will expect a pointer to the actual platform instead of to the dispatch
+ // table (on my system that pointer is always 0x7fff0000).
+ if (isAppleCL)
+#else
/* If clGetPlatformInfo is not exported and we are here, it
* means that OCL_ICD_ASSUME_ICD_EXTENSION. Si we try to take it
* from the dispatch * table. If that fails too, we have to
* bail.
*/
- if (plt_info_ptr == NULL) {
+ if (plt_info_ptr == NULL)
+#endif
+ {
+ debug(D_LOG, "Getting clGetPlatformInfo from dispatch table!");
plt_info_ptr = p->pid->dispatch->clGetPlatformInfo;
if (plt_info_ptr == NULL) {
debug(D_WARN, "Missing clGetPlatformInfo even in ICD dispatch table, skipping it");
(I'm quite content I remembered that Apple has dl_addr
and that you already checked for strstr
which kept this change simple!)
It would be nicer too have a more elegant check that also works when the framework is not in its official location. Checking just for OpenCL.framework
is too ambiguous: I've already added some logic to my ocl-icd
install recipe to create an additional installation as a framework, called ... OpenCL.framework . That in itself gives a usable way to chose between say Apple's and the POCL implementation by linking against the one OpenCL framework and then using DYLD_FRAMEWORK_PATH
to point dyld to the other. But only in applications that don't hardcode the OpenCL framework location, like e.g. opencv does ...
I also patch the wrapped implementation: https://github.com/RJVB/ocl_icd_wrapper/blob/5d0da6cc2d5cb52cb9b80e0efb1891da67f366f2/ocl_icd_wrapper.c#L118
Remember that CL_INVALID_PLATFORM error in the logs above? That came from
clGetPlatformInfo
because your code retrieves the pointer to that function dynamically andocd_icd_wrapper
is not a traditional wrapper library that overloads system functions. So you were calling into Apple's function which of course can't make any sense of the platform pointer (aka the dispatch table) it gets from you. Hence my patch toocl-icd
:
This is not what ocl_icd is doing here. It is calling the wrapper function ocl_icd_wrapper
put in it's dispatch table:
#define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_;
notice how the macro appends the underscores to the function name. The wrapper is linked against the apple OpenCL library and these functions are "embedded" inside the wrappers, and not contained into a table. I submitted a PR to your repo that should avoid the whole issue and allow not modifying ocl-icd at all: https://github.com/RJVB/ocl_icd_wrapper/pull/1/files
Let me know how it goes. If something is not working after that, it must be because we have a bug in ocl-icd that needs to be fixed.
N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is
CL_CONTEXT_REFERENCE_COUNT
sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)
Yes and yes :) . I would use atomics for such a task, they usually prove better at these kind of jobs.
A closure, like a lambda function? I suppose you'd have to write a wrapper callback function (or callback wrapper function?) that figures out what the user-provided handle is and then calls the user-provided callback with that handle? Could that be done without resorting to lookup tables? That does sound like a harder problem than fixing the retain/release mechanism so the temporary alternative could be just to raise an error when an attempt is made to use event callbacks. But then maybe that would render the entire endeavour pointless (= I am on too unfamiliar grounds to assess how necessary and common use of such callbacks is).
Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released.
On Tuesday September 19 2023 08:05:46 Brice Videau wrote:
This is not what ocl_icd is doing here. It is calling the wrapper function
ocl_icd_wrapper
put in it's dispatch table:#define DISPATCH_TABLE_ENTRY(fn) table->fn = _##fn##_;
Where exactly?
For clGetPlatformInfo
it first tries to load it dynamically from the library: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L457
That call will not fail with OpenCL.framework, and will give you a pointer to Apple's version that will not know what to do with a pointer to the dispatch table instead of whatever 0x7fff0000 points to.
I submitted a PR to your repo that should avoid the whole issue and allow not modifying ocl-icd at all: https://github.com/RJVB/ocl_icd_wrapper/pull/1/files
I'll have a look later today and let you know, but I don't see how it would get around the fact that ocl-icd will only use the dispatch table for clGetPlatformInfo
if that dynamic lookup failed.
I assume it's written that way because not intended to be used with implementations that aren't ICD-compatible, IOW the assumption is probably that the function found via dlsym will accept the platform_id handle it will be called with. I had a quick look at Mesa's OpenCL implementation, which can be built as "standalone" and as ICD-compatible, to see if they handle the 2 situations differently. Of course they don't (AFAICT), but then again there probably is no reason. I guess dependent code is not supposed to know anything about the platform_id handle so it can represent anything the implementation wants.
N00b question: the wrapper structs in question are the same type as the wrapped objects, wouldn't it be possible to retain/release the wrapper objects instead of the wrapped objects, or at least additionally? If not, is
CL_CONTEXT_REFERENCE_COUNT
sufficiently reliable to determine when the wrapper object has to be freed? Alternatively I could always create an expanded structure for the wrapper (maybe less confusingly than the existing re-implementation of cl_platform_id ^^) that adds a refcounting mechanism. I suppose I'd have to add a mutex or something of the sort to make the entire clRetainXX and clReleaseXX functions threadsafe. There are enough of those to make that a "nice" additional bit of work :)Yes and yes :)
There are 5 statements in my text above (incl. 2 explicit questions) to which you could be responding with "yes", so I have to ask you indeed meant the 2 explicit questions ;)
. I would use atomics for such a task, they usually prove better at these kind of jobs.
Faster, undoubtedly. Use them as a semaphore you mean, something like if (--atomicBarrier == 0) { "do our thing"; atomicBarrier += 1; }
?
If you did mean that "yes, it's possible to retain/release the wrapper objects" (using the underlying clRetain/Release functions!), are those locks are needed only to ensure that the 2 refcounts remain in sync, right?
Indeed. Thinking more about it a simpler approach could work: I would allocated a small structure that would contain the user data, pointer, and wrapper struct handle, and that should be enough. this data can be freed once the object is released.
Isn't that the same thing I thought up - you'd still need to install an intermediate callback that calls the intended callback with the data from that small structure, no? I thought of putting that small structure in the redefined cl_event structure in such a way a pointer to it can be returned as a cl_event*, something like
struct cl_event_wrapper {
struct cl_event theEvent;
struct wrapData {
// whatever you need
}
};
But that's dangerous . So then what, an internal lookup table that stores all those "small structures" as with the cl_event wrapper object as lookup key, queried when releasing that wrapper object?
Where exactly? For
clGetPlatformInfo
it first tries to load it dynamically from the library: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L457 That call will not fail with OpenCL.framework, and will give you a pointer to Apple's version that will not know what to do with a pointer to the dispatch table instead of whatever 0x7fff0000 points to.
dlsym should be querying pointers from the wrapper library, not the OpenCL framework. For now this symbol cannot be found in the wrapper library, so the loader will try querying it through clGetExtensionFunctionAddress
. This may have been an oversight, or a maybe way to run around a buggy implementation.
In practice, clGetExtensionFunctionAddress
should not return an address for clGetPlatformInfo
. If it does on apple, this is concerning but not something that cannot be fixed inside the wrapper, by ensuring we are not querying the implementation for clGetPlatformInfo
. I will update my PR accordingly.
I'll try to answer the ref counting questions in more details later, when I have more time.
Edit: I see the wrapper library clGetExtensionFunctionAddress
cannot return a pointer for clGetPlatformInfo
so we should be fine here. See here for the code:
https://github.com/Kerilk/ocl_icd_wrapper/blob/5d0da6cc2d5cb52cb9b80e0efb1891da67f366f2/ocl_icd_wrapper.c#L61-L72
Edit 2: see here for the specification about the ICD library (the wrapper in this case) having to provide the 3 entry points: https://github.com/KhronosGroup/OpenCL-Docs/blob/6ca05e99cb7e0c23dca7efe073b6dc9367ebbf1d/ext/cl_khr_icd.asciidoc?plain=1#L209-L215
Upon successfully loading a Vendor ICD's library, the ICD Loader queries the following functions from the library: {clIcdGetPlatformIDsKHR}, {clGetPlatformInfo}, and {clGetExtensionFunctionAddress} (note: {clGetExtensionFunctionAddress} has been deprecated, but is still required for the ICD Loader). If any of these functions are not present then the ICD Loader will close and ignore the library.
This is what my patch to the wrapper library does, implement the third missing symbol.
Edit 3: OK I get why you're getting the Apple symbol from the dlsym, it is getting grabbed because the Apple library is linked into the wrapper. It took me some time to get there. Of course my current PR is broken, since it will call itself rather than the underlying Apple implementation. I will have the get extension function to return the clGetPlatformInfo pointer which should override the one obtained through dlsym, as can be seen here: https://github.com/OCL-dev/ocl-icd/blob/fdde6677b21329432db8b481e2637cd10f7d3cb2/ocl_icd_loader.c#L290-L311
Edit 4: The real clean solution to the above issue would be having the Apple library be loaded at initialization via dlopen in the wrapper library, and it's symbol queried by dlsym. Pointers would be used inside the functions in the wrapper. This would ensure symbols from the Appple library cannot bleed into the loader.
dlsym should be querying pointers from the wrapper library, not the OpenCL framework.
AFAIK OS X works as a general Unix in this aspect. If you query a shared library for a symbol that it uses but obtains from another library, you will get the address for that symbol (barring visibility tricks) but dl_addr
will tell you that it comes from that other library.
You can check with my wrapper around dlsym: https://github.com/RJVB/legacy-tools/blob/master/dlsym.c .
Avoiding this is possible, but you'd have to overload the function. That is, ocd_icd_wrapper would have to have a function called clGetPlatformInfo
, and use an init function to obtain the pointer to Apple's implementation.
R.
Indeed, this is my understanding as well. It just took me way too long to make the connection.
Avoiding this is possible, but you'd have to overload the function. That is, ocd_icd_wrapper would have to have a function called
clGetPlatformInfo
, and use an init function to obtain the pointer to Apple's implementation.
This is indeed the "correct" solution I was proposing above (edit 4).
I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications.
Indeed, this is my understanding as well. It just took me way too long to make the connection.
Lol, I also spent way too long trying to figure out another kind of sudden, "inexplicable" behaviour the other day. Until I realised I had linked the wrapper library to my framework version of ocd-icl ;)
This is indeed the "correct" solution I was proposing above (edit 4).
I think the PR for the wrapper should be good now, and work as expected without ocl-icd modifications.
I just found the time for a quick look, saw it in one of the commits but somehow not showing up in what I take to be the combined patch; I hope to get around to testing it today.
Would it be possible to make
ocl-icd
support Apple's OpenCL.framework somehow?Or maybe it is supported already and I just do something wrong? With
clinfo
linked to ocl-icd's libOpenCL.dylib :For comparison, here's the output when
clinfo
is linked to OpenCL.framework :