Newbeeer / Poisson_flow

Code for NeurIPS 2022 Paper, "Poisson Flow Generative Models" (PFGM)
Apache License 2.0
846 stars 58 forks source link

CalledProcessError #2

Closed vagrant3427 closed 1 year ago

vagrant3427 commented 1 year ago

Hi, Yilun, when I excute the following sampling code python3 main.py --config ./configs/poisson/cifar10_ddpmpp.py --mode eval --workdir poisson/cifar10_ddpmpp --config.eval.enable_sampling --config.eval.save_images --config.eval.batch_size 100. A CalledProcessError is raised:

2022-10-27 16:39:03.643006: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.2022-10-27 16:39:03.846988: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 16:39:04.484731: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib642022-10-27 16:39:04.484847: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2022-10-27 16:39:04.484860: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.WARNING:tensorflow:From /home/xjtu/anaconda3/lib/python3.9/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
Traceback (most recent call last):  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build    subprocess.run(  File "/home/xjtu/anaconda3/lib/python3.9/subprocess.py", line 528, in run    raise CalledProcessError(retcode, process.args,subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:

Traceback (most recent call last):  File "/home/xjtu/code/Poisson_flow/main.py", line 18, in <module>    import run_lib  File "/home/xjtu/code/Poisson_flow/run_lib.py", line 30, in <module>    from models import ncsnv2, ncsnpp  File "/home/xjtu/code/Poisson_flow/models/ncsnpp.py", line 18, in <module>    from . import utils, layers, layerspp, normalization  File "/home/xjtu/code/Poisson_flow/models/layerspp.py", line 20, in <module>    from . import up_or_down_sampling  File "/home/xjtu/code/Poisson_flow/models/up_or_down_sampling.py", line 10, in <module>    from op import upfirdn2d
  File "/home/xjtu/code/Poisson_flow/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/xjtu/code/Poisson_flow/op/fused_act.py", line 11, in <module>
    fused = load(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/xjtu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
FAILED: fused_bias_act_kernel.cuda.o 
:/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc  -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/TH -isystem /home/xjtu/anaconda3/lib/python3.9/site-packages/torch/include/THC -isystem :/usr/local/cuda-11.0:/usr/local/cuda-11.0/include -isystem /home/xjtu/anaconda3/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/xjtu/code/Poisson_flow/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o 
/bin/sh: 1: :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc: not found
ninja: build stopped: subcommand failed.

Do you have any idea?

Newbeeer commented 1 year ago

Hi,

The error log seems to show that the nvcc command is not found in your system. 1: :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc: not found.

I guess you can link a correct nvcc to the paths :/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin/nvcc:.

Yilun

vagrant3427 commented 1 year ago

Hi, Yilun. It works for me. After fixing several errors and the compatibility issues, the sampling code works finally. Thanks very much. Appreciate your bralliant work!

Newbeeer commented 1 year ago

Sounds good! Feel free to reach out for further issues.