NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

AttributeError: '_SyncQueue' object has no attribute 'get' #2323

Open imadoualid opened 1 month ago

imadoualid commented 1 month ago

System Info

System Information:

CPU architecture: x86_64
CPU/Host memory size: 2.0 TiB

GPU Properties:

GPU name: NVIDIA H100 80GB HBM3
GPU memory size: 80 GB (75016 MiB / 81559 MiB)
CUDA Version: 12.4
NVIDIA Driver Version: 550.90.07

Libraries:

TensorRT-LLM branch or tag: v0.14.0.dev2024100800
TensorRT-LLM Location: /root/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages
TensorRT-LLM Dependencies:
    accelerate
    aenum
    build
    click
    cuda-python
    diffusers
    onnx
    torch
    transformers
    (and others as listed in the pip show output)

Container Information:

Not using a container.

Operating System (OS):

OS version: Ubuntu 22.04

Who can help?

No response

Information

Tasks

Reproduction

i've installed TensorRT-LLM on a conda env following the doc installation

Install dependencies, TensorRT-LLM requires Python 3.10

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.

If you want to install the stable version (corresponding to the release branch), please

remove the --pre option.

pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

Check installation

python3 -c "import tensorrt_llm"

llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0") prompts = ["Explain quantum mechanics."] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) outputs = llm.generate(prompts, sampling_params)

i'am getting this error

Processed requests:   0%|          | 0/4 [00:00<?, ?it/s]

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[11], line 1
----> 1 outputs = llm.generate(prompts, sampling_params)

File [~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py:211](https://hebjms7092xlb6-8888.proxy.runpod.net/lab/workspaces/auto-m/tree/workspace/~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py#line=210), in LLM.generate(self, inputs, sampling_params, use_tqdm, lora_request)
    205     futures.append(future)
    207 for future in tqdm(futures,
    208                    desc="Processed requests",
    209                    dynamic_ncols=True,
    210                    disable=not use_tqdm):
--> 211     future.result()
    213 if unbatched:
    214     futures = futures[0]

File [~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py:328](https://hebjms7092xlb6-8888.proxy.runpod.net/lab/workspaces/auto-m/tree/workspace/~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py#line=327), in GenerationResult.result(self, timeout)
    326 def result(self, timeout: Optional[float] = None) -> "GenerationResult":
    327     while not self._done:
--> 328         self.result_step(timeout)
    329     return self

File [~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py:318](https://hebjms7092xlb6-8888.proxy.runpod.net/lab/workspaces/auto-m/tree/workspace/~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py#line=317), in GenerationResult.result_step(self, timeout)
    317 def result_step(self, timeout: Optional[float] = None):
--> 318     response = self.queue.get(timeout=timeout)
    319     self.handle_response(response)

AttributeError: '_SyncQueue' object has no attribute 'get'

Expected behavior

the example to work

actual behavior

AttributeError: '_SyncQueue' object has no attribute 'get'

Processed requests: 0%| | 0/4 [00:00<?, ?it/s]


AttributeError Traceback (most recent call last) Cell In[4], line 11 7 sampling_params = SamplingParams(temperature=0.8, top_p=0.95) 9 llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0") ---> 11 outputs = llm.generate(prompts, sampling_params)

File ~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py:211, in LLM.generate(self, inputs, sampling_params, use_tqdm, lora_request) 205 futures.append(future) 207 for future in tqdm(futures, 208 desc="Processed requests", 209 dynamic_ncols=True, 210 disable=not use_tqdm): --> 211 future.result() 213 if unbatched: 214 futures = futures[0]

File ~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py:316, in GenerationResult.result(self, timeout) 314 def result(self, timeout: Optional[float] = None) -> "GenerationResult": 315 while not self._done: --> 316 self.result_step(timeout) 317 return self

File ~/miniconda3/envs/trt_llm_env/lib/python3.10/site-packages/tensorrt_llm/executor.py:306, in GenerationResult.result_step(self, timeout) 305 def result_step(self, timeout: Optional[float] = None): --> 306 response = self.queue.get(timeout=timeout) 307 self.handle_response(response)

AttributeError: '_SyncQueue' object has no attribute 'get' #

additional notes

tried also gemma got the same error

Superjomn commented 1 month ago

Can you retry the latest version following the linux installation? I tried and it works fine.

imadoualid commented 1 month ago

@Superjomn without docker ? what torch version u're using ?

Superjomn commented 1 month ago

Here is the latest installation instructions, it should work without docker on Ubuntu, please have a try. @imadoualid

Superjomn commented 1 month ago

I successfully reinstalled using Docker according to the install instruction, and the following code runs without issues:

from tensorrt_llm import LLM, SamplingParams

llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
prompts = ["Explain quantum mechanics."]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)

If you still encounter problems, the issue might be related to the installation process.

Lix1993 commented 1 month ago

same error, non docker env, installed by following commands.

mamba create -p conda_env python=3.10
mamba activate ./conda_env

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com

with or without --pre get same error

Edited at 20241031

dont install torch manually, just install tensorrt_llm, it works.

esnvidia commented 2 days ago

@laikhtewari same issue occurs in nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 container