NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
574 stars 43 forks source link

ARM64 compatibility #86

Open felixkarevo opened 1 month ago

felixkarevo commented 1 month ago

Any chance you will provide a Docker container for ARM64 compatibility? The main issue here is that TensorRT-LLM is not compatible with my ARM64 (aarch64) architecture.

When building the Model Optimizer example docker container I get this error ERROR: failed to solve: process "/bin/sh -c pip install \"tensorrt-llm~=$TRT_LLM_VERSION\" -U" did not complete successfully: exit code: 1.

kevalmorabia97 commented 1 month ago

If you dont need tensorrt-llm, you can remove steps related to it in docker/Dockerfile. But note that Model Optimizer is not yet officially supported for Arm. We will officially support it in next release later this month. But you are free to use previous version on Arm (not Jetson Orin). Most features should still work.

kevalmorabia97 commented 3 weeks ago

Note that from our latest release 0.19.0, we officially support SBSA Arm (not Jetson Orin)