NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
7.4k stars 800 forks source link

Support for Python 3.11 (+ windows) #1706

Open Sharrnah opened 1 month ago

Sharrnah commented 1 month ago

System Info

Python Version: CPython 3.11.6 Operating System: Windows 11 CPU Architecture: AMD64 Driver Version: 555.85 CUDA Version: 12.5

Who can help?

@ncomly-nvidia

Information

Tasks

Reproduction

I tried to use it in my own script, but wasn't successful because of error after error.

For example:

File "E:\Python\Python311\Lib\dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'tensorrt_llm.lora_manager.LoraBuildConfig'> for field lora_config is not allowed: use default_factory

After changing the mentioned lines like this:

from dataclasses import dataclass, field

@dataclass
class BuildConfig:
    lora_config: LoraBuildConfig = field(default_factory=lambda: LoraBuildConfig())
    auto_parallel_config: AutoParallelConfig = field(default_factory=lambda: AutoParallelConfig())
    plugin_config: PluginConfig = field(default_factory=lambda: PluginConfig())

i am getting

  File "E:\Python\Python311\Lib\site-packages\tensorrt_llm\hlapi\llm.py", line 16, in <module>
    from tensorrt_llm.bindings import KvCacheConfig, SchedulerPolicy
ImportError: cannot import name 'KvCacheConfig' from 'tensorrt_llm.bindings' (unknown location)

Which i wasn't able to fix yet, because it is ignoring when i change the code directly in site-packages. (probably they are prebuild? No idea.

Trying the example only tells me

line 114, in download_manual
          raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
      RuntimeError: Didn't find wheel for tensorrt-llm 0.11.0.dev2024052800

#...

The installation of tensorrt-llm for version 0.11.0.dev2024052800 failed.

      This is a special placeholder package which downloads a real wheel package
      from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
      cannot download the real wheel file to install.

And looking at https://pypi.nvidia.com i only see version 0.9.0 for python 3.10 for windows. And even the newer linux versions seem to be only available for python 3.10.

Expected behavior

Being able to install on windows and with python 3.11 (or even newer)

actual behavior

multiple errors when using older version, or trying to install newer versions.

additional notes

-

Shixiaowei02 commented 1 month ago

Please use Python 3.10 or build from source to compatible with other Python version. Thank you!

Sharrnah commented 1 month ago

i tried that. But the documentation is very lackluster in this regard.

tp5uiuc commented 2 weeks ago

Hi @Sharrnah,

https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-windows.html#acquire-an-image Its not very clear where to get that Docker container from. That folder in this repository doesn't exist, Same with the TensorRT 10.0.1.6 Zip file you are supposed to download.

If you can hop on to this link you should find step-by-step instructions for building trt-llm on bare-metal windows (without any docker containers), including links for zip files etc.

The reason you don't see the TensorRT 10.0.1.6 is because it exists only inside the Docker container. The corresponding folder in the repository (which you mention doesn't exist : apologies if that is unclear) is housed here https://github.com/NVIDIA/TensorRT-LLM/tree/main/windows.

And suggestions to improve the documentation is always welcome 🙏

Python 3.10 is only in security maintenance status anymore, so eventually this should support Python 3.11 and 3.12 eventually by default. Agree, let me check in with the team if there are plans to support wheels for 3.11 and 3.12 python. But I wouldn't hold my breath for updates on this, as it may not be a critical issue (python3.10 is still supported by all major vendors/package providers).

The reason I suspect you have errors is you are on latest main (which is not rigorously tested, but mostly for up-to-date models), but it's better to be on rel branch or v0.10.0 release (noted atop https://nvidia.github.io/TensorRT-LLM/installation/windows.html):

The Windows release of TensorRT-LLM is currently in beta. We recommend checking out the v0.10.0 tag for the most stable experience.

Can you try that one out as well, in addition to switching to 3.10. If you want to continue using 3.11, then the best way forward is to build tensorrt-llm yourselves on bare-metal using the guide posted above. Thanks!