Closed HBX-hbx closed 1 year ago
It seems that your torch can not find cuda header file.Please check your environment variable settings. BMTrain 0.2.3 has got rid of torch when compiling .so file, which means this problem won't happen anymore
It seems that your torch can not find cuda header file.Please check your environment variable settings. BMTrain 0.2.3 has got rid of torch when compiling .so file, which means this problem won't happen anymore
But 0.2.3 has not been released.
PyTorch 1.13.1 CUDA Version: 11.2
Building wheel for bmtrain (setup.py) ... error ERROR: Command errored out with exit status 1: command: /data/private/hebingxiang/miniconda3/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-hcfrmsk4/bmtrain_7495d4a1219f45dc8e9bca0dade5da43/setup.py'"'"'; file='"'"'/tmp/pip-install-hcfrmsk4/bmtrain_7495d4a1219f45dc8e9bca0dade5da43/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-fof5ziro cwd: /tmp/pip-install-hcfrmsk4/bmtrain_7495d4a1219f45dc8e9bca0dade5da43/ Complete output (67 lines): running bdist_wheel /data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-3.9 creating build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/debug.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/param_init.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/global_var.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/synchronize.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/parameter.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/utils.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/wrapper.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/block_layer.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/layer.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/init.py -> build/lib.linux-x86_64-3.9/bmtrain copying bmtrain/store.py -> build/lib.linux-x86_64-3.9/bmtrain creating build/lib.linux-x86_64-3.9/bmtrain/nccl copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-3.9/bmtrain/nccl copying bmtrain/nccl/init.py -> build/lib.linux-x86_64-3.9/bmtrain/nccl creating build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/init.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-3.9/bmtrain/lr_scheduler creating build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/init.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-3.9/bmtrain/benchmark creating build/lib.linux-x86_64-3.9/bmtrain/optim copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-3.9/bmtrain/optim copying bmtrain/optim/init.py -> build/lib.linux-x86_64-3.9/bmtrain/optim copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-3.9/bmtrain/optim copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-3.9/bmtrain/optim creating build/lib.linux-x86_64-3.9/bmtrain/distributed copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-3.9/bmtrain/distributed copying bmtrain/distributed/init.py -> build/lib.linux-x86_64-3.9/bmtrain/distributed creating build/lib.linux-x86_64-3.9/bmtrain/loss copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-3.9/bmtrain/loss copying bmtrain/loss/init.py -> build/lib.linux-x86_64-3.9/bmtrain/loss creating build/lib.linux-x86_64-3.9/bmtrain/inspect copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-3.9/bmtrain/inspect copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-3.9/bmtrain/inspect copying bmtrain/inspect/init.py -> build/lib.linux-x86_64-3.9/bmtrain/inspect copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-3.9/bmtrain/inspect running build_ext building 'bmtrain.nccl._C' extension creating build/temp.linux-x86_64-3.9 creating build/temp.linux-x86_64-3.9/csrc gcc -pthread -B /data/private/hebingxiang/miniconda3/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /data/private/hebingxiang/miniconda3/include -I/data/private/hebingxiang/miniconda3/include -fPIC -O2 -isystem /data/private/hebingxiang/miniconda3/include -fPIC -Icsrc/nccl/build/include -I/data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/include -I/data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/include/TH -I/data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/include/THC -I/data/private/hebingxiang/miniconda3/include -I/data/private/hebingxiang/miniconda3/include/python3.9 -c csrc/nccl.cpp -o build/temp.linux-x86_64-3.9/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 In file included from csrc/nccl.cpp:4: /data/private/hebingxiang/miniconda3/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory 10 | #include
| ^
~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1ERROR: Failed building wheel for bmtrain