NVlabs / stylegan2-ada

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation
https://arxiv.org/abs/2006.06676
Other
1.81k stars 499 forks source link

RuntimeError: NVCC returned an error. See below for full command line and output log: #11

Open Harry-KIT opened 4 years ago

Harry-KIT commented 4 years ago

''' nvcc --compiler-options '-fPIC' --compiler-options '-I/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=1' --linker-options '-L/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow -ltensorflow_framework' --gpu-architecture=sm_70 --use_fast_math --disable-warnings --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/protobuf_archive/src" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/com_google_absl" --include-path "/home/gayrat/miniconda3/envs/Tensorflow_v1/lib/python3.7/site-packages/tensorflow/include/external/eigen_archive" 2>&1 "/home/gayrat/PycharmProjects/stylegan2-ada-main/dnnlib/tflib/ops/fused_bias_act.cu" --shared -o "/tmp/tmpxkquz3dr/fused_bias_act_tmp.so" --keep --keep-dir "/tmp/tmpxkquz3dr"

/bin/sh: nvcc: command not found '''

Hi brother. I am trying to deal with it. Can you help me?

additional info: NVIDIA-SMI 418.165.02 Driver Version: 418.165.02 CUDA Version: 10.1
centos7 tensorflow-gpu version 1.13.1

KathrynSch commented 4 years ago

Hi, are you sure nvcc is installed and your path is configured accordingly ?

Harry-KIT commented 4 years ago

Hi i solved this issue. But i faced another one after that.

tensorflow.python.framework.errors_impl.NotFoundError: /home/gayrat/.cache/dnnlib/tflib-cudacache/fused_bias_act_6d9af12827dc4b7cf5e82793a63225e8.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

KathrynSch commented 4 years ago

this might come from the compile flag l.136 of dnnlib/tflib/custom_ops.py. Try printing tf.sysconfig.get_compile_flags() If it prints -D_GLIBCXX_USE_CXX11_ABI=1 try hard coding -D_GLIBCXX_USE_CXX11_ABI=0 That was an issue I had on previous Stylegan2

JulianPinzaru commented 3 years ago

Same problem for me... tensorflow.python.framework.errors_impl.NotFoundError: /home/weex/.cache/dnnlib/tflib-cudacache/fused_bias_act_e85d73ff2523c7830d5cd253e380d9d6.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs I tried to hard code the value to zero, as suggested by @KathrynSch which didn't work

Harry-KIT commented 3 years ago

Hi brother, I am still trying hard code. But I could not. If you find the way, plz share

JulianPinzaru commented 3 years ago

@Harry-KIT I will share if I find one that works. No luck yet. Looks like nobody is really willing to help :| The hard coded nvcc option -D_GLIBCXX_USE_CXX11_ABI=0 helped to get rid of that undefined symbol error, and now I am facing the segmentation fault error (crash).

Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Compiling... Loading... Done. Segmentation fault (core dumped)

banianzr commented 3 years ago

Hi brother, I am still trying hard code. But I could not. If you find the way, plz share

Hi, I encountered this problem too. It seems that you cannot directly hard code in this line:

compile_opts += ' --compiler-options \'-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\''

If you print: tf.sysconfig.get_compile_flags(), you will see that -D_GLIBCXX_USE_CXX11_ABI=1 is still exists. That's because the next line compile_opts += f' --compiler-options \'{" ".join(tf.sysconfig.get_compile_flags())}\'' use this information unfortunately.

After through inspection, I found that if gcc version is lower than 5, it cannot set -D_GLIBCXX_USE_CXX11_ABI.

Harry-KIT commented 3 years ago

Hi brother Which py file do i need to add "compile_opts += ' --compiler-options \'-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\''

KathrynSch commented 3 years ago

@banianzr yes, the point is to replace the line that uses tf.sysconfig.get_compile_flags() with the hard-coded one.

KathrynSch commented 3 years ago

Hi brother, I am still trying hard code. But I could not. If you find the way, plz share

@Harry-KIT Here's what I did, l.134 of /dnnlib/tflib/custom_ops.py :

elif os.name == 'posix':
     compile_opts += f' --compiler-options \'-fPIC\''
     #compile_opts += f' --compiler-options \'{" ".join(tf.sysconfig.get_compile_flags())}\''
     compile_opts += '--compiler-options \'-I/home/kas/styleganenv/lib/python3.7/site-packages/tensorflow/include -D_GLIBCXX_USE_CXX11_ABI=0\''
     compile_opts += f' --linker-options \'{" ".join(tf.sysconfig.get_link_flags())}\''

Just replace the path with yours. That worked for me. Hope it'll help.

EvaFlower commented 3 years ago

I used "pip install tensorflow-gpu==1.14.0" to reinstall the tensorflow, then worked.

antcarryelephant commented 3 years ago

I used "pip install tensorflow-gpu==1.14.0" to reinstall the tensorflow, then worked.

hello, i used this way,but it show

File "/export/nxy/stylegan2-ada-main/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string raise RuntimeError('No GPU devices found') RuntimeError: No GPU devices found did u show this message, or how do u figure it

csyanbin commented 3 years ago

Hi, for anyone who still suffers from the "undefined symbol" problem, maybe you can try to change your python version to be 3.6 instead of a higher version.

For me, this solved the problem. Hope this will help other people.

SystemErrorWang commented 2 years ago

Hi, for anyone who still suffers from the "undefined symbol" problem, maybe you can try to change your python version to be 3.6 instead of a higher version.

For me, this solved the problem. Hope this will help other people.

I tried to change the python to 3.6 but stitll get the "undefined symbol" problem