NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.79k stars 1.01k forks source link

ChatGLM3 6B Multi-batch Failed with Error #1775

Open RobinJYM opened 5 months ago

RobinJYM commented 5 months ago

System Info

Who can help?

@hijkzzz

Information

Tasks

Reproduction

python3 benchmarks/python/benchmark.py --engine_dir /trt_engines/chatglm3_6b/float16/1-gpu/ --dtype float16 --batch_size 16 --input_output_len "1024,512"

Expected behavior

pass with output

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Allocated 770.13 MiB for execution context memory.
/usr/local/lib/python3.10/dist-packages/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.)
  return _nested.nested_tensor(
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2068] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2068, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::setInputShape::2068] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2068, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[06/13/2024-03:26:19] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2842] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2842, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 416, in main
    benchmarker.run(inputs, config)
  File "/TensorRT-LLM/benchmarks/python/gpt_benchmark.py", line 254, in run
    self.decoder.decode_batch(inputs[0],
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3240, in decode_batch
    return self.decode(input_ids,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 947, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3463, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 3073, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2732, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 515, in <module>
    main(args)
  File "/TensorRT-LLM/benchmarks/python/benchmark.py", line 441, in main
    e.with_traceback())
TypeError: BaseException.with_traceback() takes exactly one argument (0 given)
[06/13/2024-03:26:25] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

additional notes

If I run without --input_output_len, it should be ok.

hijkzzz commented 5 months ago

Confirm this is a bug, investigating internally.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

AnnaYue commented 3 months ago

have met same error, it will happen when I set batch_size > 8

nv-guomingz commented 2 weeks ago

Hi @RobinJYM Could u please try the latest code base to see if issue still exist or not?