Open Quuxplusone opened 3 years ago
Regarding 1.(ii) one improvement can be done is to provide a warning when ctors are used for sttic locals.
To elaborate on use case for mon-multi threaded initialization in OpenCL - it relies on application to ensure that the status variable initialization code only runs in by one thread (for example by guarding global ID).
Currently Clang User Manual explains workaround for global variable ctors on the application side for drivers that don't support kernel languages with C++ features. It doesn't mention however whever this applies to static locals or not.
After looking at this topic I have noted the following.
See example: https://godbolt.org/z/bfaxKT
(i) It seems that to support thread safe implementation some sort of locking mechanism is to be used but however it is not provided by all OpenCL vendors.
(ii) To address portability it seems that Clang should not define __cpp_threadsafe_static_init by default, that is controlled by LangOpts.ThreadsafeStatics. So this option should be set to 0 in OpenCL mode.
(iii) Vendors that support thread safe implementation of static initialization can enable the LangOpt or alternatively it can be altered using -fthreadsafe-statics flag.
(iv) Application code can check the "__cpp_threadsafe_static_init" define to either use the static initialzation or emulate it on the application side. For example the initialization can be done by one work item by guarding the initialization code with the work item ID.
(v) Vendor will have to implement @cxa_guard_acquire, @cxa_guard_release (Itanium ABI), but the naive non-thread safe implementation can be very simple - just checking and setting flag variable generated in IR. (see @_ZGVZ8get_fredvE12a_local_fred in the example). NOTE that we don't neeed to worry about recursions since they are disallowed in OpenCL and recursiong in static locals initialization is undefined behavior in C++.