dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.31k stars 475 forks source link

Request for Docker images of tensorrt_llm #605

Open nmq45698 opened 2 months ago

nmq45698 commented 2 months ago

HI there! Currently i'm trying to deploy some small models (say gpt-2) with smoothquant on Jetson-AGX Orin. However, there is no results upon searching the image "tensorrt_llm:35.2.1" on dockerhub. I've noticed the your comments on #564 that the trtllm will be available soon. So, are there update for docker image to support the "docker run" about tensorrt_llm on Orin? thx :P

dusty-nv commented 2 months ago

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

johnnynunez commented 2 months ago

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

nmq45698 commented 2 months ago

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed.
However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python! (Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

johnnynunez commented 2 months ago

@nmq45698 still coming soon but getting closer! for now I would just run SmoothQuant inference through PyTorch like their repo shows, or use AWQ TinyChat, or MLC/TVM.

I've updated to the last version of tensort_llm https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.12.0

Thanks for your remind! :P Actually, I've noticed such repo about TensorRT-LLM, with engines successfully built on GPU with amd64.

While for Jetson AGX Orin, I've tried under some docker images pulled from @dusty-nv dustynv/... r35.x.x, which seems not supportive for TensorRT 10.3 due to the GLIBC under Ubuntu 20.04. The kINT64, kBF16 in TensorRT-LLM could not be compiled.

Then, I tried on docker images pulled from dustynv/... r36.x.x (say nano_llm) based on Ubuntu 22.04, the TensorRT 10.3 was successfully installed. However, another error about libnvinfer.so occurred when running /scripts/build_wheels.py, as well as import tensorrt in python! (Pdb) import tensorrt *** ImportError: /lib/aarch64-linux-gnu/libnvinfer.so.10: undefined symbol: _ZN5nvdla8IProfile36setCanGenerateDetailedLayerwiseStatsEb

Have you ever encountered that upon building TensorRT-LLM on Orin? ^v^

Do you use jetpack 5?