Open TrentWeiss opened 6 years ago
Also note that the CPU version of Caffe2 builds just fine, but I need GPU acceleration for my application.
Is "nvcc error : 'cicc' died with status 0xC0000005 (ACCESS_VIOLATION)" the root cause of this? Does it look like cmake is finding nvcc correctly?
I am experiencing exactly the same issue on Win10 + VS2015 + CUDA9.0. It seems that partially-specialized class inheritance on Windows is very buggy. The following patch works for me. It defines separate class templates ArgMinOpCuda
and ArgMaxOpCuda
to void the class template inheritance. However, after applying this patch, the compilation would still fail because Eigen with CUDA is still not supported on Windows.
diff --git a/caffe2/operators/arg_ops.cu b/caffe2/operators/arg_ops.cu
index 99c31dfa8..8c6aa35a6 100644
--- a/caffe2/operators/arg_ops.cu
+++ b/caffe2/operators/arg_ops.cu
@@ -48,20 +48,20 @@ __global__ void ComputeArgCUDAKernel(
} // namespace
-template <typename T>
-class ArgMaxOp<T, CUDAContext> final : public ArgOpBase<T, CUDAContext> {
+template <typename T, class Context>
+class ArgMaxOpCuda final : public ArgOpBase<T, Context> {
public:
- USE_OPERATOR_FUNCTIONS(CUDAContext);
+ USE_OPERATOR_FUNCTIONS(Context);
#if EIGEN_VERSION_AT_LEAST(3, 3, 0)
- ArgMaxOp(const OperatorDef& operator_def, Workspace* ws)
- : ArgOpBase<T, CUDAContext>(operator_def, ws),
+ArgMaxOpCuda(const OperatorDef& operator_def, Workspace* ws)
+ : ArgOpBase<T, Context>(operator_def, ws),
cuda_stream_(context_.cuda_stream()),
stream_device_(&cuda_stream_, context_.cuda_gpu_id()),
gpu_device_(&stream_device_) {}
#else // EIGEN_VERSION_AT_LEAST(3, 3, 0)
- ArgMaxOp(const OperatorDef& operator_def, Workspace* ws)
- : ArgOpBase<T, CUDAContext>(operator_def, ws) {}
+ArgMaxOpCuda(const OperatorDef& operator_def, Workspace* ws)
+ : ArgOpBase<T, Context>(operator_def, ws) {}
#endif // EIGEN_VERSION_AT_LEAST(3, 3, 0)
protected:
@@ -80,8 +80,8 @@ class ArgMaxOp<T, CUDAContext> final : public ArgOpBase<T, CUDAContext> {
#endif // EIGEN_VERSION_AT_LEAST(3, 3, 0)
};
-template <typename T>
-bool ArgMaxOp<T, CUDAContext>::Compute(
+template <typename T, class Context>
+bool ArgMaxOpCuda<T, Context>::Compute(
const T* X,
const TIndex prev_size,
const TIndex next_size,
@@ -107,20 +107,20 @@ bool ArgMaxOp<T, CUDAContext>::Compute(
return true;
}
-template <typename T>
-class ArgMinOp<T, CUDAContext> final : public ArgOpBase<T, CUDAContext> {
+template <typename T, class Context>
+class ArgMinOpCuda final : public ArgOpBase<T, Context> {
public:
- USE_OPERATOR_FUNCTIONS(CUDAContext);
+ USE_OPERATOR_FUNCTIONS(Context);
#if EIGEN_VERSION_AT_LEAST(3, 3, 0)
- ArgMinOp(const OperatorDef& operator_def, Workspace* ws)
- : ArgOpBase<T, CUDAContext>(operator_def, ws),
+ArgMinOpCuda(const OperatorDef& operator_def, Workspace* ws)
+ : ArgOpBase<T, Context>(operator_def, ws),
cuda_stream_(context_.cuda_stream()),
stream_device_(&cuda_stream_, context_.cuda_gpu_id()),
gpu_device_(&stream_device_) {}
#else // EIGEN_VERSION_AT_LEAST(3, 3, 0)
- ArgMinOp(const OperatorDef& operator_def, Workspace* ws)
- : ArgOpBase<T, CUDAContext>(operator_def, ws) {}
+ArgMinOpCuda(const OperatorDef& operator_def, Workspace* ws)
+ : ArgOpBase<T, Context>(operator_def, ws) {}
#endif // EIGEN_VERSION_AT_LEAST(3, 3, 0)
protected:
@@ -139,8 +139,8 @@ class ArgMinOp<T, CUDAContext> final : public ArgOpBase<T, CUDAContext> {
#endif // EIGEN_VERSION_AT_LEAST(3, 3, 0)
};
-template <typename T>
-bool ArgMinOp<T, CUDAContext>::Compute(
+template <typename T, class Context>
+bool ArgMinOpCuda<T, Context>::Compute(
const T* X,
const TIndex prev_size,
const TIndex next_size,
@@ -166,7 +166,7 @@ bool ArgMinOp<T, CUDAContext>::Compute(
return true;
}
-REGISTER_CUDA_OPERATOR(ArgMax, ArgMaxOp<float, CUDAContext>);
-REGISTER_CUDA_OPERATOR(ArgMin, ArgMinOp<float, CUDAContext>);
+REGISTER_CUDA_OPERATOR(ArgMax, ArgMaxOpCuda<float, CUDAContext>);
+REGISTER_CUDA_OPERATOR(ArgMin, ArgMinOpCuda<float, CUDAContext>);
} // namespace caffe2
To fully solve the issue, I think we need reimplement arg_ops.cu
, without defining class template specializations, without using Eigen.
https://github.com/caffe2/caffe2/issues/2489#issuecomment-378398030 I don't think so. I was getting this issue before because CMake did not set CUDA_HOST_COMPILER properly, although it did find NVCC. I was able to fix this issue by manually specifying the correct path of cl.exe in the CMake variable CUDA_HOST_COMPILER.
I am still having this issue with the latest master branch in pytorch (VS2015, CUDA9.0). And the change I mentioned above now completely fixes compile error (because eigen tensor implementation was removed). I still think the problem roots in the VS compiler with template class partial specialization.
@harrysummer Thanks. I have put up your suggestion here https://github.com/pytorch/pytorch/pull/6746 to fix our windows build
If this is a build issue, please fill out the template below.
System information
CMake summary output
The error I get is: C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here C:/Users/Madhur/Documents/git_repos/caffe2\caffe2/operators/arg_ops.h(17): error : invalid base class [C:\Users\Madhur\ Documents\git_repos\caffe2\build\caffe2\caffe2_gpu.vcxproj] detected during instantiation of class "caffe2::ArgOpBase<T, Context> [with T=T, Context=caffe2::CUDAContex t]" C:/Users/Madhur/Documents/git_repos/caffe2/caffe2/operators/arg_ops.cu(52): here arg_ops.cu CUSTOMBUILD : nvcc error : 'cicc' died with status 0xC0000005 (ACCESS_VIOLATION) [C:\Users\Madhur\Documents\git_repos\c affe2\build\caffe2\caffe2_gpu.vcxproj] CMake Error at caffe2_gpu_generated_arg_ops.cu.obj.Release.cmake:275 (message): Error generating file C:/Users/Madhur/Documents/git_repos/caffe2/build/caffe2/CMakeFiles/caffe2_gpu.dir/operators/Release/caffe2_gpu_gene rated_arg_ops.cu.obj
Any ideas what would cause this?