NVlabs / stylegan2-ada

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation
https://arxiv.org/abs/2006.06676
Other
1.8k stars 498 forks source link

No GPU Devices Found #74

Open shahik opened 3 years ago

shahik commented 3 years ago

Hi, I have trained on colab all is Perfect but when I train using Google Cloud Notebook I am getting RuntimeError: No GPU devices found. I have installed tensorflow gpu using, pip install tensorflow-gpu==1.14.0 also tried with 1 & 4 gpus. Any solution Plz?

Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Failed! Traceback (most recent call last): File "train.py", line 561, in main() File "train.py", line 553, in main run_training(vars(args)) File "train.py", line 451, in run_training training_loop.training_loop(training_options) File "/jet/prs/workspace/stylegan2-ada/training/training_loop.py", line 123, in training_loop Gs = G.clone('Gs') File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 457, in clone net.copy_vars_from(self) File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 490, in copy_vars_from src_net._get_vars() File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 297, in _get_vars self._vars = OrderedDict(self._get_own_vars()) File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 286, in _get_own_vars self._init_graph() File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(*self._input_templates, *build_kwargs) File "/jet/prs/workspace/stylegan2-ada/training/networks.py", line 231, in G_main num_layers = components.synthesis.input_shape[1] File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 232, in input_shape return self.input_shapes[0] File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 219, in input_shapes self._input_shapes = [t.shape.as_list() for t in self.input_templates] File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 267, in input_templates self._init_graph() File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/network.py", line 151, in _init_graph out_expr = self._build_func(self._input_templates, **build_kwargs) File "/jet/prs/workspace/stylegan2-ada/training/networks.py", line 439, in G_synthesis x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3) File "/jet/prs/workspace/stylegan2-ada/training/networks.py", line 392, in layer x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv) File "/jet/prs/workspace/stylegan2-ada/training/networks.py", line 105, in modulated_conv2d_layer s = apply_bias_act(s, bias_var='mod_bias', trainable=trainable) + 1 # [BI] Add bias (initially 1). File "/jet/prs/workspace/stylegan2-ada/training/networks.py", line 50, in apply_bias_act return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, gain=gain, clamp=clamp) File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 72, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain, clamp=clamp) File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 132, in _fused_bias_act_cuda cuda_op = _get_plugin().fused_bias_act File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/ops/fused_bias_act.py", line 18, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 139, in get_plugin compile_opts += f' --gpu-architecture={_get_cuda_gpu_arch_string()}' File "/jet/prs/workspace/stylegan2-ada/dnnlib/tflib/custom_ops.py", line 60, in _get_cuda_gpu_arch_string raise RuntimeError('No GPU devices found') RuntimeError: No GPU devices found

NVIDIA-SMI 396.51 Driver Version: 396.51 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 | | N/A 38C P0 27W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found |

'''

antcarryelephant commented 3 years ago

hi : ) I also encountered a similar situation, so how did you solve it?

johndpope commented 3 years ago

this project is abandoned - use https://github.com/NVlabs/stylegan2-ada-pytorch - you are going to want a newer cuda driver docker needs NVIDIA driver release r455.23 and above

shahik commented 3 years ago

@antcarryelephant I solved it as,

Deploy Cuda 10 deeplearning notebook google click to deploy Run JupyterLab in Cloud: gcloud compute instances describe --project [projectName] --zone [zonename] deeplearning-1-vm | grep googleusercontent.com | grep datalab

export PROJECT_ID="project name" export ZONE="zonename" export INSTANCE_NAME="instancename" gcloud compute ssh --project $PROJECT_ID --zone $ZONE \ $INSTANCE_NAME -- -L 8080:localhost:8080

set gcc version:

sudo mkdir -p /usr/local/cuda/bin sudo apt-get install gcc-7 g++-7 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 10 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 10 as described here, https://askubuntu.com/questions/26498/how-to-choose-the-default-gcc-and-g-version https://stackoverflow.com/questions/6622454/cuda-incompatible-with-my-gcc-version

SURABHI-GUPTA commented 3 years ago

@antcarryelephant check if 'tensorflow-gpu' is installed , you can install it with 'pip install tensorflow-gpu'

laurajul commented 3 years ago

@antcarryelephant check if 'tensorflow-gpu' is installed , you can install it with 'pip install tensorflow-gpu'

thanks, that solved my issue. I've sent a tip

Pfed-prog commented 3 years ago

the error persists image image

Topdog1221 commented 3 years ago

Im still having the same exact error, with no fix.

ecielyang commented 3 years ago

I have installed TensorFlow-gpu, but still cannot work.

gmign commented 2 years ago

I have installed TensorFlow-gpu, but still cannot work.

I had the same issue and I solved it using conda: conda install tensorflow-gpu==1.14

ihyunmin commented 2 years ago

I fixed about this error in /NVlabs/stylegan2/dnnlib by changing some codes. I don't know my solution is the same about this error, but i hope it can solve this error.

In my case, i changed the below cold, because i use Tesla V100. if i printed device_lib.list_local_devices(), i found that the device_type is 'XLA_GPU', is not 'GPU'. ---previous gpus = [ x for x in device_lib.list_local_devices() if x.device_type == 'GPU'] ---now gpus = [ x for x in device_lib.list_local_devices() if x.device_type == 'XLA_GPU']

liavke commented 2 years ago

@ihyunmin in which file/s did you change the command?

ihyunmin commented 2 years ago

@liavke It is in the /NVlabs/stylegan2/dnnlib file, and I don't know this repository has same code.

Kelu007 commented 1 year ago

@ihyunmin your solution helps me a lot, thank you!!

wangalong-ahpu commented 1 month ago

stylegan2/dnnlib/tflib/custom_ops.py line 50 change to gpus = [ x for x in device_lib.list_local_devices() if x.device_type == 'XLA_GPU']