Open HzfFrank opened 1 year ago
I solved it after I changed to use cuda 11.7, maybe this project doesn't support the latest version of cuda, if someone can run it on the latest version of cuda, I'll appreciate it a lot if you can share it
I meet the same case : ( My GPU is RTX4090, with cuda 12.1. I could not solve this problem : (
Similar error when running python -c "import dflex"
after installation.
RTX 4090 with cuda 11.6.
Btw, I also failed to build dflex
on A100.
After changing my cuda to 11.7, the problem still exists. RTX 3060 with cuda 11.7, pytorch 1.11.0
same issue here window11 CUDA12.2 python3.8 torch2.2.0
The issue is because this line here assumes the minimum compute capability is 35 https://github.com/NVlabs/DiffRL/blob/a4c0dd1696d3c3b885ce85a3cb64370b580cb913/dflex/dflex/adjoint.py#L1860-L1861 However after Cuda12, the minimum support version is 50: https://forums.developer.nvidia.com/t/nvcc-fatal-unsupported-gpu-architecture-compute-35/247815
I solve the issue after chance this line to:
cuda_flags = ['-gencode=arch=compute_86,code=compute_86']
I'm using CUDA12.2 and pytorch2.3.1 with RTX3060 on Ubuntu20.04 LST
I Found this link is also helpful https://stackoverflow.com/questions/68496906/pytorch-installation-for-different-cuda-architectures
I installed the pytorch using
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
in the cuda12.2, NVIDIA 4090, ubuntu20.04 system.
following the @shizhec, i check the arch of my system:
(diff) bigeast@bigeast:~/DiffRL/examples$ nvcc --list-gpu-arch
compute_50
compute_52
compute_53
compute_60
compute_61
compute_62
compute_70
compute_72
compute_75
compute_80
compute_86
compute_87
compute_89
compute_90
so i change the cuda_flags to:
cuda_flags = ['-gencode=arch=compute_86,code=compute_86']
But I still have the bug following:
(diff) bigeast@bigeast:~/DiffRL/examples$ python test_env.py --env AntEnv
Rebuilding kernels
Detected CUDA files, patching ldflags
Emitting ninja build file /home/bigeast/DiffRL/dflex/dflex/kernels/build.ninja...
Building extension module kernels...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /home/bigeast/anaconda3/envs/diff/bin/nvcc -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/bigeast/DiffRL/dflex/dflex -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/TH -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/THC -isystem /home/bigeast/anaconda3/envs/diff/include -isystem /home/bigeast/anaconda3/envs/diff/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -gencode=arch=compute_86,code=sm_86 -std=c++14 -c /home/bigeast/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
FAILED: cuda.cuda.o
/home/bigeast/anaconda3/envs/diff/bin/nvcc -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/bigeast/DiffRL/dflex/dflex -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/TH -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/THC -isystem /home/bigeast/anaconda3/envs/diff/include -isystem /home/bigeast/anaconda3/envs/diff/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -gencode=arch=compute_86,code=sm_86 -std=c++14 -c /home/bigeast/DiffRL/dflex/dflex/kernels/cuda.cu -o cuda.cuda.o
In file included from /usr/include/cuda_runtime.h:83,
from <command-line>:
/usr/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
| ^~~~~
[2/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=kernels -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/bigeast/DiffRL/dflex/dflex -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/TH -isystem /home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/include/THC -isystem /home/bigeast/anaconda3/envs/diff/include -isystem /home/bigeast/anaconda3/envs/diff/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -Z -O2 -DNDEBUG -c /home/bigeast/DiffRL/dflex/dflex/kernels/main.cpp -o main.o
/home/bigeast/DiffRL/dflex/dflex/kernels/main.cpp: In function ‘df::float3 box_sdf_grad_cpu_func(df::float3, df::float3)’:
/home/bigeast/DiffRL/dflex/dflex/kernels/main.cpp:1051:47: warning: control reaches end of non-void function [-Wreturn-type]
1051 | var_58 = df::select(var_56, var_53, var_57);
| ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test_env.py", line 17, in <module>
import envs
File "/home/bigeast/DiffRL/envs/__init__.py", line 8, in <module>
from envs.dflex_env import DFlexEnv
File "/home/bigeast/DiffRL/envs/dflex_env.py", line 15, in <module>
import dflex as df
File "/home/bigeast/DiffRL/dflex/dflex/__init__.py", line 15, in <module>
kernel_init()
File "/home/bigeast/DiffRL/dflex/dflex/sim.py", line 67, in kernel_init
kernels = df.compile()
File "/home/bigeast/DiffRL/dflex/dflex/adjoint.py", line 1865, in compile
module = torch.utils.cpp_extension.load_inline('kernels',
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1433, in load_inline
return _jit_compile(
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'kernels'
Then i found that the bug is came from the c++ complier:
unsupported GNU version! gcc versions later than 8 are not supported!
This means that the version of gcc installed on your system exceeds what CUDA supports, and CUDA 12.4 does not support versions higher than gcc 8. Check the gcc version: You can check the gcc version of the current system through the following command:
gcc --version
gcc-8
version:
First, you need to install gcc-8
and g++-8
:
sudo apt install gcc-8 g++-8
gcc
version:After installation, you can switch to gcc-8
using update-alternatives
to ensure the correct gcc
version is used during compilation.
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8
Run the following commands to select the version of gcc
you want to use:
sudo update-alternatives --config gcc
sudo update-alternatives --config g++
After selecting gcc-8
, you can re-run your compilation commands.
If you don't want to change the system-wide default gcc
, you can specify gcc-8
for the compilation process like this:
CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 python test_env.py --env AntEnv
This ensures that gcc-8
, which is supported by CUDA, is used for the compilation.
Finally, I was success:
(diff) bigeast@bigeast:~/DiffRL/examples$ CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 python test_env.py --env AntEnv
Using cached kernels
/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/gym/envs/registration.py:307: DeprecationWarning: The package name gym_robotics has been deprecated in favor of gymnasium_robotics. Please uninstall gym_robotics and install gymnasium_robotics with `pip install gymnasium_robotics`. Future releases will be maintained under the new package name gymnasium_robotics.
fn()
Setting seed: 0
/home/bigeast/anaconda3/envs/diff/lib/python3.8/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/bigeast/DiffRL/dflex/dflex/model.py:1687: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)
m.shape_transform = torch.tensor(transform_flatten_list(self.shape_transform), dtype=torch.float32, device=adapter)
fps = 20417.947956930817
mean reward = 1281.8564453125
Finish Successfully
Excuse me, I met such problem when I try the command
python test_env.py --env AntEnv
in the folderexamples
as the guide The version of my Pytorch is 1.11.0, cuda is 12.1 Is there anything wrong with my system? I'll appreciate it a lot if you can help me with this problem.