NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.6k stars 975 forks source link

branch dev-sm87-trt101 can works on Orin now ? #2029

Closed tuanhe closed 1 month ago

tuanhe commented 3 months ago

System Info

I just tried to compile the TrT-llm on AGX Orin devkit, but met some compiled error

/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:1:` error: expected ‘}’ before ‘{’ token
  935 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 0, false, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      | ^
In file included from /media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/decoderXQAImplJIT.cpp:20:
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:693:26: note: to match this ‘{’
  693 | } sXqaKernelMetaInfo[] = {
      |                          ^
In file included from /media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/decoderXQAImplJIT.cpp:20:
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:1: error: expected ‘,’ or ‘;’ before ‘{’ token
  935 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 0, false, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      | ^
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:223: error: expected unqualified-id before ‘,’ token
  935 | _sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      |                                                                                                  ^

/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:936:1: error: expected unqualified-id before ‘{’ token
  936 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 64, true, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_pagedKV_64_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_pagedKV_64_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},

I use the command : python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt Is there anyone can give me any tips

Who can help?

No response

Information

Tasks

Reproduction

branch dev-sm87-trt101 command: python3 ./scripts/build_wheel.py --trt_root 'pwd'/TensorRT

Expected behavior

compile without error report

actual behavior

TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:1: error: expected ‘}’ before ‘{’ token
  935 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 0, false, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      | ^
In file included from /media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/decoderXQAImplJIT.cpp:20:
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:693:26: note: to match this ‘{’
  693 | } sXqaKernelMetaInfo[] = {
      |                          ^
In file included from /media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/decoderXQAImplJIT.cpp:20:
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:1: error: expected ‘,’ or ‘;’ before ‘{’ token
  935 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 0, false, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      | ^
/media/x/SSD/Documents/trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/cubin/xqa_kernel_cubin.h:935:223: error: expected unqualified-id before ‘,’ token
  935 | { DATA_TYPE_FP16, DATA_TYPE_FP16, 128, 1, 8, 8, 0, false, false, kSM_90, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin, xqa_kernel_dt_fp16_d_128_beam_1_kvt_fp16_nqpkv_8_m_8_sm_90_cubin_len, "kernel_mha"},
      |                                                                                                                                                                                                                               ^

additional notes

None

QiJune commented 3 months ago

@sunnyqgg Could you please take a look? Thanks

sunnyqgg commented 3 months ago

@tuanhe Which TRT version are you using? This is for TRT 10.1

tuanhe commented 3 months ago

@tuanhe Which TRT version are you using? This is for TRT 10.1

Sure, I had updated Tensorrt to 10.1

tuanhe commented 3 months ago

@sunnyqgg Could you kindly share me the steps to follow your work?

tuanhe commented 3 months ago

@sunnyqgg Hi there , any updates?

sunnyqgg commented 3 months ago

HI @tuanhe , Actually this branch is for other uses, and we only did few verifications. If you want to use: step1: install trt 10.1 and cuda 12.x step2: python3 scripts/build_wheel.py --cuda_architectures "87-real" --trt_root your_trt_path

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 15 days with no activity.