NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.08k stars 891 forks source link

Is MPI required even multi device is disabled? #1959

Open jlewi opened 1 month ago

jlewi commented 1 month ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

I'm trying to build the wheel as follows

python3 ../tensorrt_llm/scripts/build_wheel.py --trt_root ${TRT_ROOT} -D "CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3/" -D "ENABLE_MULTI_DEVICE=0"

I end up with a linking error because MPI is missing.

[100%] Building CXX object tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/executorWorker.cpp.o
[100%] Linking CXX executable executorWorker
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_char'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Wait'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Mrecv'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_uint64_t'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Comm_spawn'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Get_count'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::MpiComm(ompi_communicator_t*, bool)'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_self'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_set'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_world'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::mprobe(int, int, ompi_message_t**, ompi_status_public_t*) const'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_create'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Barrier'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/build.make:112: tensorrt_llm/executor_worker/executorWorker] Error 1
make[2]: *** [CMakeFiles/Makefile2:1192: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1199: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/rule] Error 2
make: *** [Makefile:335: executorWorker] Error 2
Traceback (most recent call last):
  File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
    main(**vars(args))
  File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 166, in main
    build_run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,

I don't have MPI which is why I was disabling multi-device.

Expected behavior

I expect this to compile with out MPI being needed. My assumption was that MPI is only required for multi-device. That assumption could be incorrect. I was hoping to be able to compile for single device without needing MPI. Is MPI needed even for single device?

actual behavior

I got a linking error because MPI is missing

[100%] Building CXX object tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/executorWorker.cpp.o
[100%] Linking CXX executable executorWorker
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_char'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Wait'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Mrecv'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_uint64_t'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Comm_spawn'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Get_count'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::MpiComm(ompi_communicator_t*, bool)'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_self'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_set'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_world'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::mprobe(int, int, ompi_message_t**, ompi_status_public_t*) const'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_create'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Barrier'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/build.make:112: tensorrt_llm/executor_worker/executorWorker] Error 1
make[2]: *** [CMakeFiles/Makefile2:1192: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1199: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/rule] Error 2
make: *** [Makefile:335: executorWorker] Error 2
Traceback (most recent call last):
  File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
    main(**vars(args))
  File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 166, in main
    build_run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,

additional notes

I also had to remove mpi4py from requirements.txt to try to get to build without MultiDevice support.

  # N.B Hack: We remove mpi4py from the requirements because we don't have mpi libraries.
      # Hopefully that should only be needed for multi device support
      sed '/mpi4py/d' -i ../tensorrt_llm/requirements.txt 
QiJune commented 1 month ago

@Funatiq Could you please have a look? Thanks

achartier commented 1 month ago

Could you try with the following option to build_wheel.py

--extra-cmake-vars ENABLE_MULTI_DEVICE=0
jlewi commented 1 month ago

I'm trying to build it now with openmpi. It takes such a long time to build that if I have success with OpenMPI I may not want to bother with rerunning the experiments.

achartier commented 1 month ago

Fair enough. If building for a specific target architecture, -a native can provide a significant build time reduction.