ayanc / fdscs

45 stars 8 forks source link

A question about make.sh #2

Closed disco14 closed 4 years ago

disco14 commented 4 years ago

Sorry to bother you again. I successfully run zero_out.cc which is the example of the op of tensorflow website https://www.tensorflow.org/guide/create_op. But I still encountered an error when I run make.sh. My g++ is 7.5.0, cuda 10.0, tensorflow-gpu-1.13.1. I ran the code in make.sh one by one. When I run this code: nvcc -std=c++11 --expt-relaxed-constexpr -c -o slib.cu.o slib.cu.cc -D_GLIBCXX_USE_CXX11_ABI=0 \ ${TF_CFLAGS[@]} -I . -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC An error occurred: 1 error detected in the compilation of "/tmp/tmpxft_0000225f_00000000-6_slib.cu.cpp1.ii".

I think it may not be the cause of the g ++ version. I successfully run the example from the official website of tensorflow.

Could you give me some suggestion? Thank you very much.

details:

(py36) root@jxg:/home/UE4/research/Network/fdscs-master/slib# nvcc -std=c++11 --expt-relaxed-constexpr -c -o slib.cu.o slib.cu.cc -D_GLIBCXX_USE_CXX11_ABI=0 \

${TF_CFLAGS[@]} -I . -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

In file included from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/util/ConfigureVectorization.h:384:0, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:22, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1, from slib.cu.cc:5: /usr/local/cuda-10.0/bin/..//include/host_defines.h:54:2: warning: #warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]

warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead."

^~~ /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(496): error: constexpr function return is non-constant

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(55): warning: integer conversion resulted in a change of sign

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(309): warning: integer conversion resulted in a change of sign

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(310): warning: integer conversion resulted in a change of sign

1 error detected in the compilation of "/tmp/tmpxft_000017e2_00000000-6_slib.cu.cpp1.ii". (py36) root@jxg:/home/UE4/research/Network/fdscs-master/slib# ./make.sh In file included from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/util/ConfigureVectorization.h:384:0, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:22, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14, from /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1, from slib.cu.cc:5: /usr/local/cuda-10.0/bin/..//include/host_defines.h:54:2: warning: #warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]

warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead."

^~~ /home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(496): error: constexpr function return is non-constant

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(55): warning: integer conversion resulted in a change of sign

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(309): warning: integer conversion resulted in a change of sign

/home/UE4/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(310): warning: integer conversion resulted in a change of sign

1 error detected in the compilation of "/tmp/tmpxft_0000225f_00000000-6_slib.cu.cpp1.ii".

ayanc commented 4 years ago

Searching on google for the error seems to suggest that this is a problem when compiling custom ops with newer versions of tensorflow (ours was compiled with tensorflow 1.07).

This is referenced in https://github.com/tensorflow/tensorflow/issues/22766 --- it looks like that if you add a -DNDEBUG flag to the nvcc command, this should fix it.

disco14 commented 4 years ago

Wow, yes, This method is very useful. Thank you very much.

fshamsafar commented 4 years ago

I am using TF 1.15, with Cuda 10.0. The same problem appeared. Upgrading g++ and adding the -DNDEBUG did not work for me. I solved it by changing these lines:

in slib.cu.cc:

include "tensorflow/core/util/cuda_kernel_helper.h" ====>

include "tensorflow/core/util/gpu_kernel_helper.h"

in gpu_kernel_helper.h:

include "third_party/gpus/cuda/include/cuda_fp16.h" ====>

include "cuda/include/cuda_fp16.h"

in gpu_device_functions.h:

include "third_party/gpus/cuda/include/cuComplex.h" ====>

include "cuda/include/cuComplex.h"

in gpu_device_functions.h:

include "third_party/gpus/cuda/include/cuda.h" ====>

include "cuda/include/cuda.h"