NVlabs / stylegan2-ada-pytorch

StyleGAN2-ADA - Official PyTorch implementation
https://arxiv.org/abs/2006.06676
Other
4.12k stars 1.16k forks source link

Failed to build CUDA kernels for upfirdn2d. UserWarning: Failed to build CUDA kernels for upfirdn2d. #155

Open Rewwolf opened 3 years ago

Rewwolf commented 3 years ago

Describe the bug Error message Failed to build CUDA kernels for upfirdn2d. UserWarning: Failed to build CUDA kernels for upfirdn2d. After executing the following command from the readme:

python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

To Reproduce Steps to reproduce the behavior:

  1. In 'stylegan2-ada-pytorch-main' directory, run command 'python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl'
  2. See error (copy&paste full log, including exceptions and stacktraces).

Traceback (most recent call last): File "C:\Users\matth\Desktop\stylegan2-ada-pytorch-main\torch_utils\ops\upfirdn2d.py", line 32, in _init _plugin = custom_ops.get_plugin('upfirdn2d_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math']) File "C:\Users\matth\Desktop\stylegan2-ada-pytorch-main\torch_utils\custom_ops.py", line 110, in get_plugin torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs) File "C:\Users\matth\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\torch\utils\cpp_extension.py", line 1092, in load keep_intermediates=keep_intermediates) File "C:\Users\matth\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\torch\utils\cpp_extension.py", line 1318, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "C:\Users\matth\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\torch\utils\cpp_extension.py", line 1701, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "", line 583, in module_from_spec File "", line 1043, in create_module File "", line 219, in _call_with_frames_removed ImportError: DLL load failed: Das angegebene Modul wurde nicht gefunden.

warnings.warn('Failed to build CUDA kernels for upfirdn2d. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc()) Setting up PyTorch plugin "upfirdn2d_plugin"... Failed!

Desktop (please complete the following information):

Additional context ninja was also installed via pip.

thusinh1969 commented 3 years ago

Solved it.

First of all, you should uninstall completely ALL previous NVIDIA CUDA versions. I mean completely.

1) Uninstall all previous NVIDIA CUDA as usual

2) Go to your environment setup and remove ALL paths, CUDA_HOME, CUDA_PATH etc. cleanly

3) Delete all files as the uninstallation program left them there !!!

4) Clean install NVIDIA CUDA 11.1

5) Check your PATH CUDA_HOME CUDA_PATH pointing to exactly the 11.1 path

6) Install pytorch (uninstall ALL previous installed one first) pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

7) Install relevant packages pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3

8) Install Visual Studio 2019

9) Add this to your PATH: C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsx86_amd64.bat

10) Check nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:12:04_Pacific_Daylight_Time_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.relgpu_drvr455TC455_06.29069683_0

Hope that I did not miss any step. Otherwise, it should work. If it DOES NOT, Ctrl-C immediately and see if the path that is used is 11.0 or 11.1 or whatever earlier version.

It works like a charm!

Steve

P/S: on Windows you may hit a problem with OMP (Initializing libiomp5.dylib, but found libiomp5.dylib already initialized), simply ignore them by adding: os.environ['KMP_DUPLICATE_LIB_OK']='True' to your train.py and training_loop.py

MarioProjects commented 3 years ago

@thusinh1969 there an option, if we cant install Cuda 11? (RTX 2080)

thohag commented 3 years ago

Had a similar problem, solved it by copying "python38.lib" from "C:\Python38\libs" to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64". Using python 3.8 in virtual env, pytorch 1.7.1 and CUDA 11.3.

ghost commented 2 years ago

Solved it.

First of all, you should uninstall completely ALL previous NVIDIA CUDA versions. I mean completely.

  1. Uninstall all previous NVIDIA CUDA as usual
  2. Go to your environment setup and remove ALL paths, CUDA_HOME, CUDA_PATH etc. cleanly
  3. Delete all files as the uninstallation program left them there !!!
  4. Clean install NVIDIA CUDA 11.1
  5. Check your PATH CUDA_HOME CUDA_PATH pointing to exactly the 11.1 path
  6. Install pytorch (uninstall ALL previous installed one first) pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  7. Install relevant packages pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3
  8. Install Visual Studio 2019
  9. Add this to your PATH: C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsx86_amd64.bat
  10. Check nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:12:04_Pacific_Daylight_Time_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.relgpu_drvr455TC455_06.29069683_0

Hope that I did not miss any step. Otherwise, it should work. If it DOES NOT, Ctrl-C immediately and see if the path that is used is 11.0 or 11.1 or whatever earlier version.

It works like a charm!

Steve

P/S: on Windows you may hit a problem with OMP (Initializing libiomp5.dylib, but found libiomp5.dylib already initialized), simply ignore them by adding: os.environ['KMP_DUPLICATE_LIB_OK']='True' to your train.py and training_loop.py

This is a working solution. Please note that, for step8, "Visual Studio 2022" doesn't work as of now. Once I rolled back to "Visual Studio 2019", everything worked fine.

Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Infinitay commented 2 years ago

Had a similar problem, solved it by copying "python38.lib" from "C:\Python38\libs" to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64". Using python 3.8 in virtual env, pytorch 1.7.1 and CUDA 11.3.

The fix was this simple. Spent hours reinstalling and trying everything but this. Unfortunately I missed it the first time around but after following your instructions everything loads up properly.

FWIW I also have installed ninja like people suggested but that alone did not solve my issue. Only when I copied over the python.lib file was my issue resolved.

Thanks.

Essamara commented 2 years ago

fresh conda environment and re-set up again with all the right versions... it has to be CUDA 11.1 I also put environment variable manually in PATH, dont know if that worked.

Now Im learning some artwork

jexz11 commented 2 years ago

I had the exact same problem.BUT None of the above solutions solved my problem T_T

jexz11 commented 2 years ago

I later fixed this problem because I installed VS on disk D instead of DISK C

nadavpo commented 2 years ago

how can I fix this on google colab?

ikros98 commented 2 years ago

how can I fix this on google colab?

Have you managed to fix this problem on Colab?

nadavpo commented 2 years ago

how can I fix this on google colab?

Have you managed to fix this problem on Colab?

try this - !pip install torch==1.8.1 torchvision==0.9.1 ninja

ikros98 commented 2 years ago

!pip install torch==1.8.1 torchvision==0.9.1

Still same problem

styler00dollar commented 2 years ago

In my case I fixed it by installing ninja. It was missing.

neilthefrobot commented 2 years ago

This is still an issue... I have every requirement listed on the github page. I have tried every single thing people have said. Still doesn't work.
I have tried cuda 11, 11.1, 11.2, and 11.6. My path variables are all correct. Using Python 3.7 and PyTorch 1.7.1. as specified. I just don't get how this is so hard.

xingyouxin commented 1 year ago

Solved it.

First of all, you should uninstall completely ALL previous NVIDIA CUDA versions. I mean completely.

  1. Uninstall all previous NVIDIA CUDA as usual
  2. Go to your environment setup and remove ALL paths, CUDA_HOME, CUDA_PATH etc. cleanly
  3. Delete all files as the uninstallation program left them there !!!
  4. Clean install NVIDIA CUDA 11.1
  5. Check your PATH CUDA_HOME CUDA_PATH pointing to exactly the 11.1 path
  6. Install pytorch (uninstall ALL previous installed one first) pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  7. Install relevant packages pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3
  8. Install Visual Studio 2019
  9. Add this to your PATH: C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsx86_amd64.bat
  10. Check nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:12:04_Pacific_Daylight_Time_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.relgpu_drvr455TC455_06.29069683_0

Hope that I did not miss any step. Otherwise, it should work. If it DOES NOT, Ctrl-C immediately and see if the path that is used is 11.0 or 11.1 or whatever earlier version.

It works like a charm!

Steve

P/S: on Windows you may hit a problem with OMP (Initializing libiomp5.dylib, but found libiomp5.dylib already initialized), simply ignore them by adding: os.environ['KMP_DUPLICATE_LIB_OK']='True' to your train.py and training_loop.py

Useful!Thanks for your efforts!I have tested in your method. It is useful to me!

ChuaCheowHuan commented 1 year ago

I was attempting to run https://github.com/mit-han-lab/data-efficient-gans/tree/master/DiffAugment-stylegan2-pytorch on EC2 Linux and face the same issue. The EC2 linux comes preinstalled with newer version of CUDA and Pytorch.

What I did to resolve this issue:

  1. Install CUDA 11.1 following the instructions here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#gpu-instance-install-cuda
  2. Create a new virtual environment and (referencing @thusinh1969 solution above) install Pytorch 1.8.0+cu111 with the following: pip3 install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  3. Note: I was initially looking for Pytorch 1.7.1+cu111 but that does not exist.
  4. Install all other required packages in the virtual environment: pip3 install pillow scipy psutil click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3

Hope that helps.