bashbaug / opencl-runtime-loader

OpenCL Runtime Loader Library
MIT License
2 stars 0 forks source link

Builds as C, or without a dependency on the C++ STL #17

Open rcombs opened 4 years ago

rcombs commented 4 years ago

Currently, as far as I can tell, the only actual use of C++-specific behavior in the library is static _sclModuleHandle module = _sclOpenICDLoader();, which is allowed (and is thread-safe) in C++ but not in C. This introduces dependencies on a few symbols from the C++ library, which aren't available when linking to this package with gcc or clang (as opposed to g++ or clang++). This could be solved by explicitly doing platform-specific locking around the initialization, or via some atomics trickery (with potential risk of leaking a dlopen handle, but it's not like the library's going to get unloaded anyway).

bashbaug commented 4 years ago

This is interesting - how important is this?

I'd like to add support for OpenCL extensions at some point, as described in #12, and it would be a lot easier to do this (I think) using C++ features. If it's important to eliminate C++ dependencies it'd be helpful to understand this ASAP, before writing a bunch of C++ code. I'd also want to switch to (or add) CI coverage using C compilers to avoid introducing C++ dependencies by accident.

Thanks!

rcombs commented 4 years ago

Linking static C++ libraries into C libraries is always a pain; it can be worked around by tweaking the build system to use the C++ linker instead of the C one, but it's always a bit awkward (and isn't something that can really be expected for a library like this, where consumers may often not be aware of it). Using C++ in general isn't necessarily a problem, as long as there aren't any dependencies on external symbols. Here it was __cxa_guard_acquire for thread-safe static initialization (which I'm currently disabling for testing purposes but will have to deal with at some point) and __gxx_personality_v0 for exception handling (which isn't actually used at all and is safe to disable here). Using STL classes tends to result in similar issues, but defining and using your own classes is fine. Modern C linkers have no trouble handling C++ symbols within their inputs, they just don't link in the C++ STL without a lot of awkward manual platform-specific wrangling; using C++ without disabling the features that pull in STL or other C++-runtime symbols effectively means adding a hidden (and fairly confusing) dependency.

bashbaug commented 4 years ago

Sorry for going radio silent, I haven't had a chance to work on this as much as I would have liked to recently.

Is there a preferred method for one-time thread-safe global initialization with minimal dependencies?

I think I could do something like the Khronos OpenCL ICD loader and use pthread_once (link).

The open source ocl-icd has a similar pthread_once path and an alternative that uses the atomic __sync_bool_compare_and_swap (link).

I was trying to convince myself that these functions either did not need to operate on global variables or did not need to be thread-safe, in which case this problem becomes a lot easier to solve, but I don't think those are safe assumptions unfortunately...

rcombs commented 4 years ago

The biggest problem with pthread_once is that it adds a dependency on libpthread (which isn't too big an ask on POSIX platforms but can be something significant on Windows); if you decide to go that route I'd suggest including a minimal Windows implementation as well.

That basic atomics-only implementation seems like it's probably fine for this use-case? It's not like the operation involved (just opening the lib) is expensive, so you wouldn't end up spinning for long, and this call isn't made that frequently anyway.

Alternately, you could do something like this:

static _Atomic _sclModuleHandle g_module = NULL; // or atomic<_sclModuleHandle>
_sclModuleHandle module = atomic_load(&g_module);
if (!module) {
  _sclModuleHandle loaded = _sclOpenICDLoader();
  if (!loaded)
    return CL_DEVICE_NOT_FOUND; // or something
  if (atomic_compare_exchange_strong(&g_module, &module, loaded))
    module = loaded;
  else
    _sclCloseICDLoader(loaded); // #define to dlclose() or FreeLibrary()
}

This atomically checks if there's an existing copy of the library open. If so, it uses that; otherwise, it tries to open it. If that succeeds, it atomically checks if another thread beat us. If we won, it stores the copy it just loaded and moves on; if a different thread did, it closes the copy it just loaded and uses the one from the other thread. This should be quite robust, fast, and not require any dependencies on reasonable target platforms.

bashbaug commented 4 years ago

Great, if C11/C++11 atomics are fair game I think that'd be my preferred portable mechanism. I'll get this added in the next couple days. Thanks for your input!

rcombs commented 4 years ago

This is mostly fixed by #18; the only thing missing is the addition of -fno-exceptions to CXXFLAGS on compilers that support it (which I'm currently doing manually but should probably be integrated into the cmakelists).

bashbaug commented 3 years ago

I've disabled exceptions in #22. I think this is the last fix for this issue? Let me know if you find anything else - thanks!