Open rcombs opened 4 years ago
This is interesting - how important is this?
I'd like to add support for OpenCL extensions at some point, as described in #12, and it would be a lot easier to do this (I think) using C++ features. If it's important to eliminate C++ dependencies it'd be helpful to understand this ASAP, before writing a bunch of C++ code. I'd also want to switch to (or add) CI coverage using C compilers to avoid introducing C++ dependencies by accident.
Thanks!
Linking static C++ libraries into C libraries is always a pain; it can be worked around by tweaking the build system to use the C++ linker instead of the C one, but it's always a bit awkward (and isn't something that can really be expected for a library like this, where consumers may often not be aware of it). Using C++ in general isn't necessarily a problem, as long as there aren't any dependencies on external symbols. Here it was __cxa_guard_acquire
for thread-safe static initialization (which I'm currently disabling for testing purposes but will have to deal with at some point) and __gxx_personality_v0
for exception handling (which isn't actually used at all and is safe to disable here). Using STL classes tends to result in similar issues, but defining and using your own classes is fine. Modern C linkers have no trouble handling C++ symbols within their inputs, they just don't link in the C++ STL without a lot of awkward manual platform-specific wrangling; using C++ without disabling the features that pull in STL or other C++-runtime symbols effectively means adding a hidden (and fairly confusing) dependency.
Sorry for going radio silent, I haven't had a chance to work on this as much as I would have liked to recently.
Is there a preferred method for one-time thread-safe global initialization with minimal dependencies?
I think I could do something like the Khronos OpenCL ICD loader and use pthread_once
(link).
The open source ocl-icd has a similar pthread_once
path and an alternative that uses the atomic __sync_bool_compare_and_swap
(link).
I was trying to convince myself that these functions either did not need to operate on global variables or did not need to be thread-safe, in which case this problem becomes a lot easier to solve, but I don't think those are safe assumptions unfortunately...
The biggest problem with pthread_once
is that it adds a dependency on libpthread (which isn't too big an ask on POSIX platforms but can be something significant on Windows); if you decide to go that route I'd suggest including a minimal Windows implementation as well.
That basic atomics-only implementation seems like it's probably fine for this use-case? It's not like the operation involved (just opening the lib) is expensive, so you wouldn't end up spinning for long, and this call isn't made that frequently anyway.
Alternately, you could do something like this:
static _Atomic _sclModuleHandle g_module = NULL; // or atomic<_sclModuleHandle>
_sclModuleHandle module = atomic_load(&g_module);
if (!module) {
_sclModuleHandle loaded = _sclOpenICDLoader();
if (!loaded)
return CL_DEVICE_NOT_FOUND; // or something
if (atomic_compare_exchange_strong(&g_module, &module, loaded))
module = loaded;
else
_sclCloseICDLoader(loaded); // #define to dlclose() or FreeLibrary()
}
This atomically checks if there's an existing copy of the library open. If so, it uses that; otherwise, it tries to open it. If that succeeds, it atomically checks if another thread beat us. If we won, it stores the copy it just loaded and moves on; if a different thread did, it closes the copy it just loaded and uses the one from the other thread. This should be quite robust, fast, and not require any dependencies on reasonable target platforms.
Great, if C11/C++11 atomics are fair game I think that'd be my preferred portable mechanism. I'll get this added in the next couple days. Thanks for your input!
This is mostly fixed by #18; the only thing missing is the addition of -fno-exceptions
to CXXFLAGS on compilers that support it (which I'm currently doing manually but should probably be integrated into the cmakelists).
I've disabled exceptions in #22. I think this is the last fix for this issue? Let me know if you find anything else - thanks!
Currently, as far as I can tell, the only actual use of C++-specific behavior in the library is
static _sclModuleHandle module = _sclOpenICDLoader();
, which is allowed (and is thread-safe) in C++ but not in C. This introduces dependencies on a few symbols from the C++ library, which aren't available when linking to this package withgcc
orclang
(as opposed tog++
orclang++
). This could be solved by explicitly doing platform-specific locking around the initialization, or via some atomics trickery (with potential risk of leaking a dlopen handle, but it's not like the library's going to get unloaded anyway).