NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.3k stars 927 forks source link

built tensorrt_llm-0.14.0.dev2024092401-cp310-cp310-linux_aarch64.whl on Jetson AGX Orin Developer Kit 32gb #2266

Open whitesscott opened 14 hours ago

whitesscott commented 14 hours ago

System Info

Here is the environment in which _tensorrt_llm-0.14.0.dev2024092401-cp310-cp310-linuxaarch64.whl was built.

Now I've got a problem in that pynvml does not work much on Jetson. So if anyone has a tensorrt_llm profiler.py that's works on Jetson or is more conversant with and could modify it I'll be able to test this. If not I'll try to make a version of profiler.py that works without pynvml.

If you would like a copy of the wheel let me know where to upload.

It would not build when I set --extra-cmake-vars "ENABLE_MULTI_DEVICE=0" because of undefined references to ompimpi and MPI_ when linking to libtensorrt_llm.so.

Then tried with following command line. and computer os configuration.

export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH

python3 ./scripts/build_wheel.py --cuda_architectures "native" \ --nccl_root "/usr/lib/aarch64-linux-gnu" \ --extra-cmake-vars "USE_CUDNN=1;USE_CUSPARSELT=1" \ --python_bindings

tensorrt-llm-compilation.txt python_packages.txt

Who can help?

No response

Information

Tasks

Reproduction

successful build_wheel.py on Nvidia Jetson Orin AGX Developer kit

pynvml doesn't work on nvidia jetson

Expected behavior

profiler.py will need to be modified to get tensorrt_llm operational on Jetson.

actual behavior

import tensorrt_llm Traceback (most recent call last): File "", line 1, in File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/init.py", line 35, in import tensorrt_llm.runtime as runtime File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/init.py", line 22, in from .model_runner import ModelRunner File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 30, in from ..builder import Engine, EngineConfig, get_engine_version File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 30, in from .auto_parallel import auto_parallel File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/init.py", line 1, in from .auto_parallel import auto_parallel File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/auto_parallel.py", line 14, in from .config import AutoParallelConfig File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/config.py", line 9, in from .cluster_info import ClusterInfo, cluster_infos File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 12, in from tensorrt_llm.profiler import PyNVMLContext, _device_get_memory_info_fn ImportError: cannot import name '_device_get_memory_info_fn' from 'tensorrt_llm.profiler' (/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/profiler.py)

additional notes

import pynvml

print(pynvml.version) Traceback (most recent call last): File "", line 1, in AttributeError: module 'pynvml' has no attribute 'version'

whitesscott commented 12 hours ago

There are two pynvml based packages.

  1. pip install pynvml that I uninstalled and kept the following one
  2. pip install nvidia-ml-py that installs pynvml.py and is the correct package .

I'll work with Tensorrt_llm more tomorrow but with this slightly modified profiler.py I got rid of the instant errors. profiler.py.txt