NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.83k stars 1.01k forks source link

pynvml version issue #2524

Open apbose opened 2 days ago

apbose commented 2 days ago

System Info

Tensorrt-llm v0.14.0

Who can help?

No response

Information

Tasks

Reproduction

Looks like the pynvml version needs to be fixed in the latest release. pip install tensorrt-llm leads to pynvml 12.0.0 but import tensorrt_llm leads to

  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/__init__.py", line 35, in <module>
    import tensorrt_llm
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/__init__.py", line 35, in <module>
    import tensorrt_llm.runtime as runtime
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/runtime/__init__.py", line 22, in <module>
    import tensorrt_llm.runtime as runtime
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/runtime/__init__.py", line 22, in <module>
    from .model_runner import ModelRunner
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 26, in <module>
    from .model_runner import ModelRunner
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 26, in <module>
    from .. import profiler
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/profiler.py", line 121, in <module>
    from .. import profiler
  File "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/tensorrt_llm/profiler.py", line 121, in <module>
    if pynvml.__version__ < '11.5.0' or driver_version < '526':
    if pynvml.__version__ < '11.5.0' or driver_version < '526':
AttributeError: module 'pynvml' has no attribute '__version__'
AttributeError: module 'pynvml' has no attribute '__version__'

Downgrading pynvml to 11.5.0 leads the error to go away.

Expected behavior

import tensorrt-llm should work without error

actual behavior

It is leading to the error above.

additional notes

Maybe there should be additional handling of the pynvml versions in the code

nv-guomingz commented 1 day ago

Hi @apbose we've rootcaused this issue which raised by pynvml updating on Dec.2. The coming 0.15 release will fix it. If you wanna to fix it by yourself, just modify this line https://github.com/NVIDIA/TensorRT-LLM/blob/main/requirements.txt#L14 to pynvml~=11.5.0