NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.3k stars 927 forks source link

Update TensorRT-LLM #1763

Closed kaiyux closed 3 months ago

kaiyux commented 3 months ago
pfk-beta commented 3 months ago

Hi, thanks for your hard work, btw. I have spotted huge removal in examples/run.py: https://github.com/NVIDIA/TensorRT-LLM/commit/db4edea1e1359bcfcac7bbb87c1b639b5611c721#diff-299cb0140ad8f9d286c86ecc32b793b048531e27570675b94e54b57b66b3d7d5. Is it intented?

pfk-beta commented 3 months ago

Sorry for false alarm, these arguments was moved to utils. I didn't spotted it