Closed xiaohaihui-smart closed 1 year ago
同样的问题,cuda11.7, torch1.13.1,ubuntu22.04,python3.9 `Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting bmtrain Using cached https://pypi.tuna.tsinghua.edu.cn/packages/13/b3/f414fc642070bb5baddab996bc4667bc16ae5f329094bd87ba923a1e7028/bmtrain-0.2.2.tar.gz (58 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages (from bmtrain) (1.24.3) Building wheels for collected packages: bmtrain Building wheel for bmtrain (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [123 lines of output]
running bdist_wheel
/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-39
creating build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-39/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-39/bmtrain
creating build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-39/bmtrain/benchmark
creating build/lib.linux-x86_64-cpython-39/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-39/bmtrain/nccl
copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/nccl
creating build/lib.linux-x86_64-cpython-39/bmtrain/distributed
copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-39/bmtrain/distributed
creating build/lib.linux-x86_64-cpython-39/bmtrain/inspect
copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-39/bmtrain/inspect
creating build/lib.linux-x86_64-cpython-39/bmtrain/loss
copying bmtrain/loss/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-39/bmtrain/loss
creating build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-39/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-cpython-39/bmtrain/optim
copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim
copying bmtrain/optim/init.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-39/bmtrain/optim
running build_ext
/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/utils/cpp_extension.py:387: UserWarning: The detected CUDA version (11.3) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'bmtrain.nccl._C' extension
creating build/temp.linux-x86_64-cpython-39
creating build/temp.linux-x86_64-cpython-39/csrc
gcc -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -I/home/yjy/anaconda3/envs/cpm/include -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -fPIC -Icsrc/nccl/build/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/TH -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/home/yjy/anaconda3/envs/cpm/include/python3.9 -c csrc/nccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
g++ -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -shared -Wl,-rpath,/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath-link,/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath,/home/yjy/anaconda3/envs/cpm/lib -Wl,-rpath-link,/home/yjy/anaconda3/envs/cpm/lib -L/home/yjy/anaconda3/envs/cpm/lib build/temp.linux-x86_64-cpython-39/csrc/nccl.o -L/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/lib -L/usr/local/cuda-11.3/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-39/bmtrain/nccl/_C.cpython-39-x86_64-linux-gnu.so
building 'bmtrain.optim._cuda' extension
creating build/temp.linux-x86_64-cpython-39/csrc/cuda
gcc -pthread -B /home/yjy/anaconda3/envs/cpm/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -I/home/yjy/anaconda3/envs/cpm/include -fPIC -O2 -isystem /home/yjy/anaconda3/envs/cpm/include -fPIC -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/TH -I/home/yjy/anaconda3/envs/cpm/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/home/yjy/anaconda3/envs/cpm/include/python3.9 -c csrc/adam_cuda.cpp -o build/temp.linux-x86_64-cpython-39/csrc/adam_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
Traceback (most recent call last):
File "
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain Installing collected packages: bmtrain Running setup.py install for bmtrain ... error error: subprocess-exited-with-error`
似乎是因为缺少了 nvcc, 试试装一下 conda install cuda-nvcc -c nvidia
cuda11.7 ,torch==1.13.1,ubuntu22.04版本下安装失败,这个该怎么解决,是版本适配的问题吗 Collecting bmtrain Downloading bmtrain-0.2.2.tar.gz (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.7/58.7 kB 432.6 kB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/jysm/chat/cpm_venv/lib/python3.10/site-packages (from bmtrain) (1.24.1) Building wheels for collected packages: bmtrain Building wheel for bmtrain (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [58 lines of output] running bdist_wheel /home/jysm/chat/cpm_venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-310/bmtrain copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-310/bmtrain creating build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-310/bmtrain/inspect creating build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-310/bmtrain/lr_scheduler creating build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-310/bmtrain/benchmark creating build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/nccl creating build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss copying bmtrain/loss/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/loss creating build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/distributed creating build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/init.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-310/bmtrain/optim running build_ext error: [Errno 2] No such file or directory: '/usr/local/cuda:/usr/local/cuda/bin/nvcc' [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for bmtrain Running setup.py clean for bmtrain Failed to build bmtrain ERROR: Could not build wheels for bmtrain, which is required to install pyproject.toml-based projects