NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.76k stars 1k forks source link

Can TensorRT-LLM supoort facebook/nllb-200-3.3B? #970

Open micronetboy opened 10 months ago

micronetboy commented 10 months ago

System Info

Nvidia A100 PCIe 80G

Who can help?

none

Information

Tasks

Reproduction

Can TensorRT-LLM supoort facebook/nllb-200-3.3B?

Expected behavior

Can TensorRT-LLM supoort facebook/nllb-200-3.3B?

actual behavior

none

additional notes

none

StephennFernandes commented 7 months ago

@ncomly-nvidia hey any update on this, i am looking forward to also port seamlessM4T_v2 models to tensorRT LLM. would be willing to write all the code and make the port, and submit a PR. need some direction and rough guidance.

nv-guomingz commented 1 week ago

re-assign to @laikhtewari