NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.47k stars 957 forks source link

_SyncQueue class attributeError: #2346

Open vonodiripsa opened 5 days ago

vonodiripsa commented 5 days ago

Latest TRT-LLM has a bug in TensorRT-LLM/tree/main/tensorrt_llm/executor.py

Issue: File /databricks/python/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py:211, in LLM.generate(self, inputs, sampling_params, use_tqdm, lora_request) 205 futures.append(future) 207 for future in tqdm(futures, 208 desc="Processed requests", 209 dynamic_ncols=True, 210 disable=not use_tqdm): --> 211 future.result() 213 if unbatched: 214 futures = futures[0]

File /databricks/python/lib/python3.10/site-packages/tensorrt_llm/executor.py:328, in GenerationResult.result(self, timeout) 326 def result(self, timeout: Optional[float] = None) -> "GenerationResult": 327 while not self._done: --> 328 self.result_step(timeout) 329 return self

File /databricks/python/lib/python3.10/site-packages/tensorrt_llm/executor.py:318, in GenerationResult.result_step(self, timeout) 317 def result_step(self, timeout: Optional[float] = None): --> 318 response = self.queue.get(timeout=timeout) 319 self.handle_response(response)

AttributeError: '_SyncQueue' object has no attribute 'get'

Could you please add to _SyncQueue something like:

   def get(self, timeout: Optional[float] = None):
        '''
        Waits for the event to be set (meaning something is put into the queue), then retrieves the item.
        Supports a `timeout` for waiting.
        '''
        try:
            # Attempt to get an item from the queue, respecting the timeout
            item = self._q.get(timeout=timeout)
            # If the queue is now empty, clear the event
            if self._q.empty():
                self._event.clear()
            return item
        except Empty:
            # Handle timeout scenario
            raise TimeoutError(f"Queue get operation timed out after {timeout} seconds.")

It is very critical, because our LLM customer demo is failing.

Superjomn commented 5 days ago

What version are you using? And can you share some usage code for us to reproduce it? @vonodiripsa

Superjomn commented 5 days ago

Could be a similar issue: https://github.com/NVIDIA/TensorRT-LLM/issues/2323. Are you using a Docker or non-Docker environment?