undefined reference to `__libc_single_threaded'

hoangvictor commented 1 day ago

System Info

System:

CPU Architecture: x86_64
GPU: NVIDIA H100 - 80GB - CUDA 12.4
TensorRT-LLM: main branch, commit 535c9cc6730f5ac999e4b1cb621402b58138f819
Operating System: Ubuntu 22.04

Dependencies (All these components were sourced from NVIDIA-provided tar files):

NCCL: 2.23.4-1
TensorRT: 10.6.0.26

Additional Environment Details:

ldd: 2.35
gcc: 11.4.0

Who can help?

No response

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I built TensorRT-LLM within a CONDA environment using the following command: python scripts/build_wheel.py --clean --trt_root /data0/tien/TensorRT-10.6.0.26 --nccl_root /data0/tien/nccl-2.23.4-1/build

Expected behavior

The build process should complete without errors.

actual behavior

But at the end, it raises the following error:

...
[100%] Building CXX object tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/ncclPlugin/recvPlugin.cpp.o
[100%] Building CXX object tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/common/gemmPluginProfiler.cpp.o
[100%] Building CXX object tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/common/plugin.cpp.o
[100%] Building CXX object tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/api/tllmPlugin.cpp.o
[100%] Linking CXX shared library libnvinfer_plugin_tensorrt_llm.so
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../kernels/cutlass_kernels/libfpA_intB_gemm_src.a(bf16_int4_gemm_fg_scalebias.cu.o): in function `std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .part.0]':
tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text+0x13): undefined reference to `__libc_single_threaded'
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../kernels/cutlass_kernels/libfpA_intB_gemm_src.a(bf16_int4_gemm_fg_scalebias.cu.o): in function `void tensorrt_llm::common::Logger::log<>(tensorrt_llm::common::Logger::Level, char const*) [clone .constprop.0]':
tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text+0x1766): undefined reference to `__libc_single_threaded'
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text+0x17db): undefined reference to `__libc_single_threaded'
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../kernels/cutlass_kernels/libfpA_intB_gemm_src.a(bf16_int4_gemm_fg_scalebias.cu.o): in function `virtual thunk to tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunner<__nv_bfloat16, cutlass::integer_subbyte<4, false>, (cutlass::WeightOnlyQuantOp)3, __nv_bfloat16, __nv_bfloat16, __nv_bfloat16>::~CutlassFpAIntBGemmRunner()':
tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text._ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_ED1Ev[_ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_ED1Ev]+0x17d): undefined reference to `__libc_single_threaded'
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text._ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_ED1Ev[_ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_ED1Ev]+0x1f3): undefined reference to `__libc_single_threaded'
/data0/tien/anaconda3/envs/trt-llm/bin/../lib/gcc/x86_64-conda-linux-gnu/11.4.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../kernels/cutlass_kernels/libfpA_intB_gemm_src.a(bf16_int4_gemm_fg_scalebias.cu.o):tmpxft_0022667f_00000000-6_bf16_int4_gemm_fg_scalebias.compute_90a.cudafe1.cpp:(.text._ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_E16getWorkspaceSizeEiii[_ZN12tensorrt_llm7kernels15cutlass_kernels24CutlassFpAIntBGemmRunnerI13__nv_bfloat16N7cutlass15integer_subbyteILi4ELb0EEELNS4_17WeightOnlyQuantOpE3ES3_S3_S3_E16getWorkspaceSizeEiii]+0x208): more undefined references to `__libc_single_threaded' follow
collect2: error: ld returned 1 exit status
gmake[3]: *** [tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/build.make:730: tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:2167: tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:2174: tensorrt_llm/plugins/CMakeFiles/nvinfer_plugin_tensorrt_llm.dir/rule] Error 2
gmake: *** [Makefile:452: nvinfer_plugin_tensorrt_llm] Error 2
Traceback (most recent call last):
  File "/data0/tien/TensorRT-LLM/scripts/build_wheel.py", line 434, in <module>
    main(**vars(args))
  File "/data0/tien/TensorRT-LLM/scripts/build_wheel.py", line 208, in main
    build_run(
  File "/data0/tien/anaconda3/envs/trt-llm/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 384 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings   executorWorker  ' returned non-zero exit status 2.

additional notes

I attempted to modify the LD_LIBRARY_PATH environment to closely match the setup in the Docker container, but it did not resolve the issue.

(trt-llm) tien@h100:/data0/tien/TensorRT-LLM$ echo $LD_LIBRARY_PATH
/data0/tien/anaconda3/envs/trt-llm/lib/python3.10/site-packages/torch/lib:/data0/tien/anaconda3/envs/trt-llm/lib/python3.10/site-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/data0/tien/TensorRT-10.6.0.26/lib

hello-11 commented 1 day ago

@hoangvictor Could you use the docker image to install Trt-LLM?

hoangvictor commented 1 day ago

@hello-11 Unfortunately, at the moment, I am unable to use the Docker for this setup due to permission restrictions in my team's environment. It is better if I can install TensorRT-LLM using Conda environment.

NVIDIA / TensorRT-LLM