OpenBMB / BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Apache License 2.0
560 stars 77 forks source link

Failed to install BMTrain: ~/has_inf_nan.cu(11): error: identifier "__heq" is undefined #80

Closed lindylin1817 closed 1 year ago

lindylin1817 commented 1 year ago

When I run the following command to install BMTrain python setup.py install I met the error of


running install
/home/yhlin/torch_env/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/home/yhlin/torch_env/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing bmtrain.egg-info/PKG-INFO
writing dependency_links to bmtrain.egg-info/dependency_links.txt
writing requirements to bmtrain.egg-info/requires.txt
writing top-level names to bmtrain.egg-info/top_level.txt
reading manifest file 'bmtrain.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'bmtrain.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/block_layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/synchronize.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/parameter.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/param_init.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/global_var.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/wrapper.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/utils.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/store.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/pipe_layer.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/init.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/debug.py -> build/lib.linux-x86_64-cpython-38/bmtrain
copying bmtrain/checkpointing.py -> build/lib.linux-x86_64-cpython-38/bmtrain
creating build/lib.linux-x86_64-cpython-38/bmtrain/loss
copying bmtrain/loss/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/loss
copying bmtrain/loss/cross_entropy.py -> build/lib.linux-x86_64-cpython-38/bmtrain/loss
creating build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/all_gather.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/reduce_scatter.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/send_recv.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/shape.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/utils.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
copying bmtrain/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/benchmark
creating build/lib.linux-x86_64-cpython-38/bmtrain/distributed
copying bmtrain/distributed/ops.py -> build/lib.linux-x86_64-cpython-38/bmtrain/distributed
copying bmtrain/distributed/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/distributed
creating build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/adam.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/adam_offload.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/optim_manager.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
copying bmtrain/optim/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/optim
creating build/lib.linux-x86_64-cpython-38/bmtrain/nccl
copying bmtrain/nccl/enums.py -> build/lib.linux-x86_64-cpython-38/bmtrain/nccl
copying bmtrain/nccl/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/nccl
creating build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/no_decay.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/linear.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/noam.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/exponential.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/warmup.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
copying bmtrain/lr_scheduler/cosine.py -> build/lib.linux-x86_64-cpython-38/bmtrain/lr_scheduler
creating build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/model.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/format.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/__init__.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
copying bmtrain/inspect/tensor.py -> build/lib.linux-x86_64-cpython-38/bmtrain/inspect
running build_ext
building 'bmtrain.nccl._C' extension
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc
Emitting ninja build file /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/nccl/build/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/nccl.cpp -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -pthread -shared -B /opt/conda/compiler_compat -L/opt/conda/lib -Wl,-rpath=/opt/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/nccl.o -L/home/yhlin/torch_env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-cpython-38/bmtrain/nccl/_C.cpython-38-x86_64-linux-gnu.so
building 'bmtrain.optim._cuda' extension
creating /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda
Emitting ninja build file /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/adam_cuda.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/adam_cuda.cpp -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/adam_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
[2/3] /usr/local/cuda/bin/nvcc  -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o
/usr/local/cuda/bin/nvcc  -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/TH -I/home/yhlin/torch_env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/yhlin/torch_env/include -I/opt/conda/include/python3.8 -c -c /home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu -o /home/yhlin/BMTrain/build/temp.linux-x86_64-cpython-38/csrc/cuda/has_inf_nan.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu(11): error: identifier "__heq" is undefined

1 error detected in the compilation of "/home/yhlin/BMTrain/csrc/cuda/has_inf_nan.cu".

I went into the code of has_inf_nan.cu, and found there was no anyother place to define "__heq". Can you help to solve it? Thanks!

a710128 commented 1 year ago

Did you set the TORCH_CUDA_ARCH_LIST environment variable?

OldSixOne commented 1 year ago

TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0 8.6+PTX" pip3 install bmtrain

ithyl commented 1 year ago

same issue,do you resolve that ?

CH_EXTENSION_NAME=_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_50,code=sm_50 -std=c++17 csrc/cuda/has_inf_nan.cu(11): error: identifier "__heq" is undefined

  1 error detected in the compilation of "csrc/cuda/has_inf_nan.cu".
  error: command '/usr/local/cuda-11.7/bin/nvcc' failed with exit code 1
  [end of output]