built tensorrt_llm-0.14.0.dev2024092401-cp310-cp310-linux_aarch64.whl on Jetson AGX Orin Developer Kit 32gb

System Info

Here is the environment in which _tensorrt_llm-0.14.0.dev2024092401-cp310-cp310-linuxaarch64.whl was built.

Now I've got a problem in that pynvml does not work much on Jetson. So if anyone has a tensorrt_llm profiler.py that's works on Jetson or is more conversant with and could modify it I'll be able to test this. If not I'll try to make a version of profiler.py that works without pynvml.

If you would like a copy of the wheel let me know where to upload.

It would not build when I set --extra-cmake-vars "ENABLE_MULTI_DEVICE=0" because of undefined references to ompimpi and MPI_ when linking to libtensorrt_llm.so.

Then tried with following command line. and computer os configuration.

export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/openmpi/lib:$LD_LIBRARY_PATH

python3 ./scripts/build_wheel.py --cuda_architectures "native" \ --nccl_root "/usr/lib/aarch64-linux-gnu" \ --extra-cmake-vars "USE_CUDNN=1;USE_CUSPARSELT=1" \ --python_bindings

JetPack 6.1 Jetson Linux 36.4 Linux Kernel 5.15 and Ubuntu 22.04 based root file system.
CUDA 12.6
Python 3.10.12
torch-2.4.0a0%2Bc469d14-cp310-cp310-linux_aarch64.whl Pytorch compiled on Jetson 36.3 os with same versions cuda cudnn.
nv-tensorrt-local-tegra-repo-ubuntu2204-10.4.0-cuda-12.6_1.0-1_arm64.deb
cuDNN 9.4.0-1
libnccl2, libnccl-dev 23.4-1+cuda12.6 arm64
Added these ubuntu packages libopenmpi-dev libopenmpi3 libcaf-openmpi-3 openmpi-bin openmpi-common
Editited build_wheel.py to change job_count=8

tensorrt-llm-compilation.txt python_packages.txt

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

successful build_wheel.py on Nvidia Jetson Orin AGX Developer kit

pynvml doesn't work on nvidia jetson

Expected behavior

profiler.py will need to be modified to get tensorrt_llm operational on Jetson.

actual behavior

import tensorrt_llm Traceback (most recent call last): File "", line 1, in File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/init.py", line 35, in import tensorrt_llm.runtime as runtime File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/init.py", line 22, in from .model_runner import ModelRunner File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 30, in from ..builder import Engine, EngineConfig, get_engine_version File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 30, in from .auto_parallel import auto_parallel File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/init.py", line 1, in from .auto_parallel import auto_parallel File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/auto_parallel.py", line 14, in from .config import AutoParallelConfig File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/config.py", line 9, in from .cluster_info import ClusterInfo, cluster_infos File "/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 12, in from tensorrt_llm.profiler import PyNVMLContext, _device_get_memory_info_fn ImportError: cannot import name '_device_get_memory_info_fn' from 'tensorrt_llm.profiler' (/home/scott/.local/lib/python3.10/site-packages/tensorrt_llm/profiler.py)

additional notes

import pynvml

print(pynvml.version) Traceback (most recent call last): File "", line 1, in AttributeError: module 'pynvml' has no attribute 'version'

NVIDIA / TensorRT-LLM