NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.02k stars 883 forks source link

Fail to buid inference trt_llm image : make: *** [Makefile:64: release_build] Error 1 #1904

Open dadaguai-jiangjun opened 1 month ago

dadaguai-jiangjun commented 1 month ago

System Info

CPU: X86 Memory size: 2TB GPU Name: H20 TensorRT-LLM: 0.10.0 OS:Alibaba Cloud Linux release 3 (Soaring Falcon) GPU Driver:550.54.15 CUDA:cuda_12.4.r12.4/compiler.33961263_0 Docker: 26.1.3

Who can help?

No response

Information

Tasks

Reproduction

1、cd nvtest-20240218 2、install nvtest 3、pip3 install paramiko 4、nvtest image make benchmarks/gpu/inference/trt_llm/

Expected behavior

successfully build trt_llm image

actual behavior

nvtest - INFO - #34 415.7 nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_40_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_4_bf16_gelu.generated.cu". nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_64_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 81%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_64_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_32_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_32_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_40_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_40_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_64_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_pagedKV_64_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.7 gmake[3]: [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:636: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_f16_silu.generated.cu.o] Error 2 nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_104_sm89.cubin.cpp.o nvtest - INFO - #34 415.7 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_qk_tanh_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_128_128_64_2_f16_silu.generated.cu". nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_40_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 gmake[3]: [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:650: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_4_bf16_gelu.generated.cu.o] Error 2 nvtest - INFO - #34 415.8 [ 83%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_64_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_80_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_96_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_104_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_104_sm90.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_qk_tanh_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_sm89.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_qk_tanh_sm90.cubin.cpp.o nvtest - INFO - #34 415.8 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_128_sm90.cubin.cpp.o nvtest - INFO - #34 415.8 gmake[3]: * [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:244: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_128_128_64_2_f16_silu.generated.cu.o] Error 2 nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_40_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_40_sm89.cubin.cpp.o nvtest - INFO - #34 415.9 [ 84%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_80_sm89.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_64_sm89.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_64_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_80_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_96_sm89.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_pagedKV_96_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_160_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_160_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_16_sm89.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_192_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 415.9 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_192_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 85%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_256_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_256_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_32_sm89.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_160_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_16_sm89.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_160_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_192_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_192_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_256_alibi_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 86%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_16_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_256_tma_ws_sm90.cubin.cpp.o nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_32_sm89.cubin.cpp.o nvtest - INFO - #34 416.0 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_pagedKV_32_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 87%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 89%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_32_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 90%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_64_ldgsts_sm90.cubin.cpp.o nvtest - INFO - #34 416.1 [ 90%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/fmhaRunner.cpp.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banBadWords.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banRepeatNgram.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels16.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels32.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels4.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels64.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchKernels/beamSearchKernels8.cu.o nvtest - INFO - #34 416.2 [ 90%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cumsumLastDim.cu.o nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/buildRelativeAttentionBiasKernel.cu.o nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/customAllReduceKernels.cu.o nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention.cu.o nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decodingCommon.cu.o nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decodingKernels.cu.o nvtest - INFO - #34 416.2 4 errors detected in the compilation of "/src/tensorrt_llm/cpp/build/tensorrt_llm/kernels/cutlass_kernels/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_bf16_gelu.generated.cu". nvtest - INFO - #34 416.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/gptKernels.cu.o nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/layernormKernels.cu.o nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/groupGemm.cu.o nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/lookupKernels.cu.o nvtest - INFO - #34 416.3 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/lruKernel.cu.o nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mambaConv1dKernels.cu.o nvtest - INFO - #34 416.3 gmake[3]: * [tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/build.make:594: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/cutlass_instantiations/gemm_grouped/fused_moe_sm80_16_256_64_3_bf16_gelu.generated.cu.o] Error 2 nvtest - INFO - #34 416.3 gmake[2]: [CMakeFiles/Makefile2:1092: tensorrt_llm/kernels/cutlass_kernels/CMakeFiles/moe_gemm_src.dir/all] Error 2 nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/penaltyKernels.cu.o nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mixtureOfExperts/moe_kernels.cu.o nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/quantization.cu.o nvtest - INFO - #34 416.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/preQuantScaleKernel.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/rmsnormKernels.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingAirTopPKernels.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopPKernels.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopKKernels.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/selectiveScan.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/explicitDraftTokensKernels.cu.o nvtest - INFO - #34 416.4 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/externalDraftTokensKernels.cu.o nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/common.cu.o nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/medusaDecodingKernels.cu.o nvtest - INFO - #34 416.4 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/speculativeDecoding/kvCacheUpdateKernels.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/splitkGroupGemm.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/stopCriteriaKernels.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_bf16.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_fp8.cu.o nvtest - INFO - #34 416.5 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_bf16_int8.cu.o nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_float.cu.o nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_fp8.cu.o nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_float_int8.cu.o nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_fp8.cu.o nvtest - INFO - #34 416.5 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_half.cu.o nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels/unfusedAttentionKernels_2_half_int8.cu.o nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/fp8Gemm.cu.o nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/int8SQ.cu.o nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4GroupwiseColumnMajorFalse.cu.o nvtest - INFO - #34 416.6 [ 95%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4GroupwiseColumnMajorInterleavedTrue.cu.o nvtest - INFO - #34 416.6 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4PerChannelColumnMajorFalse.cu.o nvtest - INFO - #34 416.6 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int4PerChannelColumnMajorInterleavedTrue.cu.o nvtest - INFO - #34 416.7 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int8PerChannelColumnMajorFalse.cu.o nvtest - INFO - #34 416.7 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherBf16Int8PerChannelColumnMajorInterleavedTrue.cu.o nvtest - INFO - #34 416.8 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int4GroupwiseColumnMajorFalse.cu.o nvtest - INFO - #34 417.1 In file included from /src/tensorrt_llm/cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.h:22, nvtest - INFO - #34 417.1 from /src/tensorrt_llm/cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.cu:44: nvtest - INFO - #34 417.1 /src/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels.h:26:10: fatal error: cutlass/gemm/group_array_problem_shape.hpp: No such file or directory nvtest - INFO - #34 417.1 26 | #include <cutlass/gemm/group_array_problem_shape.hpp> nvtest - INFO - #34 417.1 | ^~~~~~~~~~~~ nvtest - INFO - #34 417.1 compilation terminated. nvtest - INFO - #34 417.1 gmake[3]: [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:6278: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/mixtureOfExperts/moe_kernels.cu.o] Error 1 nvtest - INFO - #34 417.1 gmake[3]: Waiting for unfinished jobs.... nvtest - INFO - #34 422.9 [ 96%] Built target common_src nvtest - INFO - #34 423.7 [ 96%] Built target layers_src nvtest - INFO - #34 426.5 [ 96%] Built target runtime_src nvtest - INFO - #34 482.4 [ 96%] Linking CUDA device code CMakeFiles/cutlass_src.dir/cmake_device_link.o nvtest - INFO - #34 482.5 [ 96%] Linking CXX static library libcutlass_src.a nvtest - INFO - #34 482.7 [ 96%] Built target cutlass_src nvtest - INFO - #34 521.1 gmake[2]: [CMakeFiles/Makefile2:1014: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/all] Error 2 nvtest - INFO - #34 868.5 [ 96%] Built target decoder_attention_src nvtest - INFO - #34 868.5 gmake[1]: [CMakeFiles/Makefile2:969: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2 nvtest - INFO - #34 868.5 gmake: [Makefile:205: tensorrt_llm] Error 2 nvtest - INFO - #34 868.5 Traceback (most recent call last): nvtest - INFO - #34 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 389, in nvtest - INFO - #34 868.5 main(vars(args)) nvtest - INFO - #34 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 187, in main nvtest - INFO - #34 868.5 build_run( nvtest - INFO - #34 868.5 File "/usr/lib/python3.10/subprocess.py", line 526, in run nvtest - INFO - #34 868.5 raise CalledProcessError(retcode, process.args, nvtest - INFO - #34 868.5 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 120 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings benchmarks executorWorker ' returned non-zero exit status 2. nvtest - INFO - #34 ERROR: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1 nvtest - INFO - ------ nvtest - INFO - > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks --cuda_architectures 89-real;90-real:nvtest - INFO - 868.5 gmake[1]: ** [CMakeFiles/Makefile2:969: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2 nvtest - INFO - 868.5 gmake: [Makefile:205: tensorrt_llm] Error 2 nvtest - INFO - 868.5 Traceback (most recent call last): nvtest - INFO - 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 389, in nvtest - INFO - 868.5 main(vars(args)) nvtest - INFO - 868.5 File "/src/tensorrt_llm/scripts/build_wheel.py", line 187, in main nvtest - INFO - 868.5 build_run( nvtest - INFO - 868.5 File "/usr/lib/python3.10/subprocess.py", line 526, in run nvtest - INFO - 868.5 raise CalledProcessError(retcode, process.args, nvtest - INFO - 868.5 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 120 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings benchmarks executorWorker ' returned non-zero exit status 2. nvtest - INFO - ------ nvtest - INFO - Dockerfile.multi:72 nvtest - INFO - -------------------- nvtest - INFO - 71 | ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" nvtest - INFO - 72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \ nvtest - INFO - 73 | >>> python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS} nvtest - INFO - 74 |
nvtest - INFO - -------------------- nvtest - INFO - ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1 nvtest - INFO - make: ** [Makefile:64: release_build] Error 1 nvtest - INFO - make: Leaving directory '/home/gpu_mode/nvtest-20240218/image/TensorRT-LLM/docker' Traceback (most recent call last): File "/usr/bin/nvtest", line 11, in load_entry_point('nvtest==22.12.1', 'console_scripts', 'nvtest')() File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 445, in run Command(sys.argv[1:]) File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 52, in init getattr(self, args.command)() File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 344, in image args.func(args) File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/nvtest.py", line 242, in _image ret = self.host.run(final_command) File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/host.py", line 70, in run return self.backend.run(command, args,
kwargs) File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/backend/local.py", line 18, in run return self.run_local(self.get_command(command, *args)) File "/usr/local/python3.8/lib/python3.8/site-packages/nvtest-22.12.1-py3.8.egg/nvtest/common/backend/base.py", line 216, in run_local stderr = p.stderr.read() AttributeError: 'NoneType' object has no attribute 'read'

additional notes

① I found that the folders (cutlass cxxopts json NVTX) in TensorRT-LLM/3rdparty are empty, so I replaced them with what I downloaded on github. ② I have tried changing the --parallel count in TensorRT-LLM/scripts/build_wheel.py,but it didn't work.There is enough memory in my system(2TB).

dadaguai-jiangjun commented 1 month ago

I found that there is an error as blow when I am building the image:

nvtest - INFO - #34 237.6 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtest - INFO - #34 237.6 dask-cuda 24.4.0 requires pynvml<11.5,>=11.0.0, but you have pynvml 11.5.0 which is incompatible.
nvtest - INFO - #34 237.6 torch-tensorrt 2.4.0a0 requires tensorrt==10.0.1, but you have tensorrt 10.1.0 which is incompatible.

So I change the version of tensorrt (from 10.1.0 to 10.0.1) in the TensorRT-LLM/requirements.txt.Then it works,I successfully build the trt image nvtest-inference-trt_llm. But I wonder why it build failed with the default version specified in the TensorRT-LLM/requirements.txt ? IS it a compatible issue in the latest version of TensorRT-LLM?