Open whk6688 opened 10 months ago
It looks like a environment issue of CUDA. The error happens at
TLLM_CUDA_CHECK(cublasCreate(handle.get()));
which initialize the cublas handler. How do you build the docker image? Could you try running other gpus program?
build docker with: make -C docker release_build NOTE: my platform is ORIN
@whk6688 Can you please look at https://github.com/NVIDIA/TensorRT-LLM/issues/488#issuecomment-1848697981? As of right now Orin is not formally supported
Hi @whk6688 do u still have further issue or question now? If not, we'll close it soon.
when run build script:
nohup python build.py --hf_model_dir /code/tensorrt_llm/tmp/Qwen/models--Qwen--Qwen-7B-Chat/snapshots/8d24619bab456ea5abe2823c1d05fc5edec19174/ \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --enable_context_fmha \ --use_gemm_plugin float16 \ --output_dir ./tmp/Qwen/7B/trt_engines/fp16/1-gpu/ \ --parallel_build > t.log 2>&1 &
error: [TensorRT-LLM][ERROR] tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] CUDA runtime error in cublasCreate(handle.get()): CUBLAS_STATUS_ALLOC_FAILED (/src/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:181) 55 0xaaaab7da80f4 PyEval_EvalCode + 116 56 0xaaaab7ddc83c python(+0x21c83c) [0xaaaab7ddc83c] 57 0xaaaab7dd3f48 python(+0x213f48) [0xaaaab7dd3f48] 58 0xaaaab7ddc4ec python(+0x21c4ec) [0xaaaab7ddc4ec] 59 0xaaaab7ddb654 _PyRun_SimpleFileObject + 388 60 0xaaaab7ddb220 _PyRun_AnyFileObject + 80 61 0xaaaab7dcab00 Py_RunMain + 512 62 0xaaaab7d99208 Py_BytesMain + 36 63 0xffff8a2273fc /lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff8a2273fc] 64 0xffff8a2274cc libc_start_main + 152 65 0xaaaab7d990f0 _start + 48 Traceback (most recent call last): File "/code/tensorrt_llm/examples/qwen/build.py", line 655, in
build(0, args)
File "/code/tensorrt_llm/examples/qwen/build.py", line 625, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/code/tensorrt_llm/examples/qwen/build.py", line 554, in build_rank_engine
tensorrt_llm_qwen(inputs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 547, in forward
hidden_states = super().forward(input_ids, position_ids, use_cache,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 428, in forward
hidden_states = layer(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 314, in forward
attention_output = self.attention(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call__
output = self.forward(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 179, in forward
qkv = self.qkv(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 137, in forward
return self.multiply_gather(x,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 113, in multiply_gather
x = _gemm_plugin(x,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 59, in _gemm_plugin
layer = default_trtnet().add_plugin_v2(plug_inputs, gemm_plug)
TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported:
Invoked with: <tensorrt.tensorrt.INetworkDefinition object at 0xfffeb8229ab0>, [<tensorrt.tensorrt.ITensor object at 0xfffeb8751430>, <tensorrt.tensorrt.ITensor object at 0xfffeb3f63b30>], None