BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

RuntimeError: Error building extension 'wkv_1024' #129

Closed sanwei111 closed 1 year ago

sanwei111 commented 1 year ago

在linux,容器的cuda是11.6,虚拟环境是11.7

[1/1] c++ wkv_op.o wkv_cuda.cuda.o -shared -L/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o wkv_1024.so FAILED: wkv_1024.so c++ wkv_op.o wkv_cuda.cuda.o -shared -L/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o wkv_1024.so /usr/bin/ld: cannot find -lcudart collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/opt/conda/envs/rwkv/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/workspace/data/jarvis/code/RWKV-LM/RWKV-v4neo/train.py", line 293, in from src.model import RWKV File "/workspace/data/jarvis/code/RWKV-LM/RWKV-v4neo/src/model.py", line 80, in wkvcuda = load(name=f"wkv{T_MAX}", sources=["cuda/wkv_op.cpp", "cuda/wkv_cuda.cu"], verbose=True, extra_cuda_cflags=["-res-usage", "--maxrregcount 60", "--use_fast_math", "-O3", "-Xptxas -O3", "--extra-device-vectorization", f"-DTmax={T_MAX}"]) File "/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/envs/rwkv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'wkv_1024'

BlinkDL commented 1 year ago

your error is: /usr/bin/ld: cannot find -lcudart solution: export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH