Closed masterav10 closed 2 years ago
It sounds like NVCC on Windows is now using Clang as backend instead of MSVC. That would explain a couple of things since their version of __declspec(thread)
doesn't support destructors: https://clang.llvm.org/docs/AttributeReference.html#thread
So, instead of checking for C++11, let's check for CUDA on Windows, with something like
#if !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32) && defined(__CUDACC__)
...
#elif !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32)
...
I am certain this not the case, at least on my machine. If I run from plain old command line, I get the following:
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
The documentation here makes no mention of clang either.
Yes, it does: https://developer.nvidia.com/cuda-llvm-compiler
It probably gets preprocessed by clang, at the very least.
Actually, no, it doesn't mention Clang. It looks like it gets preprocessed by cudafe: https://arcb.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf And the string "thread-local variable cannot be dynamically initialized" is found in cudafe++.exe, so that sounds like a bug in there, and I think adding a special case for it is the right thing to do.
I think we are in business now. Let me know if you'd like anything else changed.
It looks like we can use thread_local
and __declspec(thread)
interchangeably in the same positions, so we can simplify this a bit.
And so we can; see the new changes.
Yeah, but we only really care about that on Windows, and I've also noticed that GCC doesn't support __declspec(thread)
, so we might as well use thread_local
when we have a C++11 compiler, and only fall back on __declspec(thread)
. What about something like this?
--- a/src/main/java/org/bytedeco/javacpp/tools/Generator.java
+++ b/src/main/java/org/bytedeco/javacpp/tools/Generator.java
@@ -525,7 +525,11 @@ public class Generator {
out.println("}");
out.println();
out.println("#if !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32)");
+ out.println("#if __cplusplus >= 201103L || _MSC_VER >= 1900");
+ out.println(" static thread_local struct JavaCPP_thread_local {");
+ out.println("#else");
out.println(" static __declspec(thread) struct JavaCPP_thread_local {");
+ out.println("#endif");
out.println(" JNIEnv* env = NULL;");
out.println(" ~JavaCPP_thread_local() {");
out.println(" if (env && JavaCPP_vm) {");
Fixes issue described here: https://github.com/bytedeco/javacpp-presets/issues/1239
Uses thread_local if available, which seems to behave better when nvcc is compiling.