OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560 stars 77 forks source link

cuda11.7 ,torch==1.13.1,ubuntu22.04版本下安装失败? #110

Closed xiaohaihui-smart closed 1 year ago

xiaohaihui-smart commented 1 year ago

cuda11.7 ,torch==1.13.1,ubuntu22.04版本下安装失败,这个该怎么解决,是版本适配的问题吗 Collecting bmtrain Downloading bmtrain-0.2.2.tar.gz (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 432.6 kB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/jysm/chat/cpm_venv/lib/python3.10/site-packages (from bmtrain) (1.24.1) Building wheels for collected packages: bmtrain Building wheel for bmtrain (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [58 lines of output] running bdist_wheel /home/jysm/chat/cpm_venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-310/bmtrain creating build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect creating build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler creating build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark creating build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl creating build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss creating build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed creating build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim running build_ext error: [Errno 2] No such file or directory: '/usr/local/cuda:/usr/local/cuda/bin/nvcc' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain ERROR: Could not build wheels for bmtrain, which is required to install pyproject.toml-based projects

buzzf commented 1 year ago

同样的问题,cuda11.7, torch1.13.1,ubuntu22.04,python3.9 `Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting bmtrain Using cached https://pypi.tuna.tsinghua.edu.cn/packages/13/b3/f414fc642070bb5baddab996bc4667bc16ae5f329094bd87ba923a1e7028/bmtrain-0.2.2.tar.gz (58 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages (from bmtrain) (1.24.3) Building wheels for collected packages: bmtrain Building wheel for bmtrain (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [123 lines of output] running bdist_wheel /home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-cpython-39 creating build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-39/bmtrain copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-39/bmtrain creating build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark creating build/lib.linux-x86_64-cpython-39/bmtrain/nccl copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-39/bmtrain/nccl copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/nccl creating build/lib.linux-x86_64-cpython-39/bmtrain/distributed copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/distributed copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-39/bmtrain/distributed creating build/lib.linux-x86_64-cpython-39/bmtrain/inspect copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect creating build/lib.linux-x86_64-cpython-39/bmtrain/loss copying bmtrain/loss/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/loss copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-39/bmtrain/loss creating build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler creating build/lib.linux-x86_64-cpython-39/bmtrain/optim copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim copying bmtrain/optim/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim running build_ext /home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py:387: UserWarning: The detected CUDA version (11.3) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) building 'bmtrain.nccl._C' extension creating build/temp.linux-x86_64-cpython-39 creating build/temp.linux-x86_64-cpython-39/csrc gcc -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -I/home/yjy/anaconda3/envs/cpm/include -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -fPIC -Icsrc/nccl/build/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/TH -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/home/yjy/anaconda3/envs/cpm/include/python3.9 -c csrc/nccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 g++ -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -shared -Wl,-rpath,/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath-link,/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath,/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath-link,/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib build/temp.linux-x86_64-cpython-39/csrc/nccl.o -L/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/lib -L/usr/local/cuda-11.3/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-39/bmtrain/nccl/_C.cpython-39-x86_64-linux-gnu.so building 'bmtrain.optim._cuda' extension creating build/temp.linux-x86_64-cpython-39/csrc/cuda gcc -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -I/home/yjy/anaconda3/envs/cpm/include -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -fPIC -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/TH -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/home/yjy/anaconda3/envs/cpm/include/python3.9 -c csrc/adam_cuda.cpp -o build/temp.linux-x86_64-cpython-39/csrc/adam_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/tmp/pip-install-qkbd394p/bmtrain_1e5b558c47da47f8b09d7978555abe57/setup.py", line 74, in setup( File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/init.py", line 107, in setup return distutils.core.setup(**attrs) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 131, in run self.run_command(cmd_name) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/dist.py", line 1244, in run_command super().run_command(command) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions build_ext.build_extensions(self) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension _build_ext.build_extension(self, ext) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 600, in compile self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_single_compile cflags = unix_cuda_flags(cflags) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 548, in unix_cuda_flags cflags + _get_cuda_arch_flags(cflags)) File "/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1793, in _get_cuda_arch_flags raise ValueError(f"Unknown CUDA arch ({arch}) or GPU not supported") ValueError: Unknown CUDA arch (8.9) or GPU not supported [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain Installing collected packages: bmtrain Running setup.py install for bmtrain ... error error: subprocess-exited-with-error`

menghuu commented 1 year ago

似乎是因为缺少了 nvcc, 试试装一下 conda install cuda-nvcc -c nvidia