BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.05k stars 827 forks source link

训练到这一步报错 build.ninja... #148

Open hopeforus opened 1 year ago

hopeforus commented 1 year ago

mitting ninja build file /home/hope/.cache/torch_extensions/py310_cu117/wkv_1024/build.ninja... Building extension module wkv_1024... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/TH -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/THC -isystem /home/hope/miniconda3/envs/rwkv/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -res-usage --maxrregcount 60 --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DTmax=1024 -std=c++14 -c /home/hope/work/RWKV-LM/RWKV-v4neo/cuda/wkv_cuda.cu -o wkv_cuda.cuda.o FAILED: wkv_cuda.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/TH -isystem /home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/include/THC -isystem /home/hope/miniconda3/envs/rwkv/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -res-usage --maxrregcount 60 --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DTmax=1024 -std=c++14 -c /home/hope/work/RWKV-LM/RWKV-v4neo/cuda/wkv_cuda.cu -o wkv_cuda.cuda.o In file included from /usr/include/cuda_runtime.h:83, from : /usr/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported! 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported! | ^~~~~ ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/hope/work/RWKV-LM/RWKV-v4neo/train.py", line 307, in from src.model import RWKV File "/home/hope/work/RWKV-LM/RWKV-v4neo/src/model.py", line 80, in wkvcuda = load(name=f"wkv{T_MAX}", sources=["cuda/wkv_op.cpp", "cuda/wkv_cuda.cu"], verbose=True, extra_cuda_cflags=["-res-usage", "--maxrregcount 60", "--use_fast_math", "-O3", "-Xptxas -O3", "--extra-device-vectorization", f"-DTmax={T_MAX}"]) File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/hope/miniconda3/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'wkv_1024'

gg22mm commented 1 year ago

环境问题,如果不会解决 ,建议下载个docker 继承环境来测试:https://zhuanlan.zhihu.com/p/616986651

hopeforus commented 1 year ago

多谢啦

HuXinjing commented 8 months ago

我也遇到同样的问题,请问你通过配置环境解决了吗

Lixuanhe commented 2 months ago

I removed "-Xptxas -O3" from wkv6_cuda and that solved the problem.