deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
189 stars 63 forks source link

Add tensorrt_llm libs to LD_LIBRARY_PATH #1988

Closed nikhil-sk closed 3 months ago

nikhil-sk commented 3 months ago

Description

  1. Currently, on the container, the libtriton_tensorrtllm_common.so is not correctly linked to libtensorrt_llm.so:
    root@1255f90326fc:/opt/tritonserver/backends/tensorrtllm# ldd libtriton_tensorrtllm_common.so
        linux-vdso.so.1 (0x00007fff9a34c000)
        libtensorrt_llm.so => not found
        libtritonserver.so => /opt/tritonserver/lib/libtritonserver.so (0x00007f14ad69f000)
        libmpi_cxx.so.40 => /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007f14ad685000)
    ....
  2. Adding the path to tensorrt_llm libs from dist-packages to fix this. After fix:
    root@1255f90326fc:/opt/tritonserver/backends/tensorrtllm# ldd libtriton_tensorrtllm_common.so
        linux-vdso.so.1 (0x00007ffc8d976000)
        libtensorrt_llm.so => /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so (0x00007ff0f7764000)
        libtritonserver.so => /opt/tritonserver/lib/libtritonserver.so (0x00007ff0f724c000)
        libmpi_cxx.so.40 => /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x00007ff0f7229000)
  3. Could also be related to flaky segfault, not being able to link to certain *.so at runtime...