bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.49k stars 583 forks source link

Problem: nvcc does not respect __declspec(thread) #612

Closed masterav10 closed 2 years ago

masterav10 commented 2 years ago

Fixes issue described here: https://github.com/bytedeco/javacpp-presets/issues/1239

Uses thread_local if available, which seems to behave better when nvcc is compiling.

saudet commented 2 years ago

It sounds like NVCC on Windows is now using Clang as backend instead of MSVC. That would explain a couple of things since their version of __declspec(thread) doesn't support destructors: https://clang.llvm.org/docs/AttributeReference.html#thread

So, instead of checking for C++11, let's check for CUDA on Windows, with something like

#if !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32) && defined(__CUDACC__)
...
#elif !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32)
...
masterav10 commented 2 years ago

I am certain this not the case, at least on my machine. If I run from plain old command line, I get the following:

nvcc fatal : Cannot find compiler 'cl.exe' in PATH

The documentation here makes no mention of clang either.

saudet commented 2 years ago

Yes, it does: https://developer.nvidia.com/cuda-llvm-compiler

It probably gets preprocessed by clang, at the very least.

saudet commented 2 years ago

Actually, no, it doesn't mention Clang. It looks like it gets preprocessed by cudafe: https://arcb.csc.ncsu.edu/~mueller/cluster/nvidia/2.0/nvcc_2.0.pdf And the string "thread-local variable cannot be dynamically initialized" is found in cudafe++.exe, so that sounds like a bug in there, and I think adding a special case for it is the right thing to do.

masterav10 commented 2 years ago

I think we are in business now. Let me know if you'd like anything else changed.

saudet commented 2 years ago

It looks like we can use thread_local and __declspec(thread) interchangeably in the same positions, so we can simplify this a bit.

masterav10 commented 2 years ago

And so we can; see the new changes.

saudet commented 2 years ago

Yeah, but we only really care about that on Windows, and I've also noticed that GCC doesn't support __declspec(thread), so we might as well use thread_local when we have a C++11 compiler, and only fall back on __declspec(thread). What about something like this?

--- a/src/main/java/org/bytedeco/javacpp/tools/Generator.java
+++ b/src/main/java/org/bytedeco/javacpp/tools/Generator.java
@@ -525,7 +525,11 @@ public class Generator {
         out.println("}");
         out.println();
         out.println("#if !defined(NO_JNI_DETACH_THREAD) && defined(_WIN32)");
+        out.println("#if __cplusplus >= 201103L || _MSC_VER >= 1900");
+        out.println("   static thread_local struct JavaCPP_thread_local {");
+        out.println("#else");
         out.println("   static __declspec(thread) struct JavaCPP_thread_local {");
+        out.println("#endif");
         out.println("       JNIEnv* env = NULL;");
         out.println("       ~JavaCPP_thread_local() {");
         out.println("           if (env && JavaCPP_vm) {");