-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…
-
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Feature request: BF16 support for inte…
-
**Description**
We're seeing significant latency in the order of 300-600 hundred milliseconds between COMPUTE_END and REQUEST_END on a TensorRT-LLM model. See OTEL trace image below.
![image](http…
-
Inference failed while using ab with more than 3 concurrency but was ok with 1 or 2 concurrency. Using an A10G GPU, with Driver Version: 545.23.06,CUDA Version: 12.3, trt version:9.1, vicuna 13b-1.5-…
-
trtllm crashes when I give long context requests within the `max-input-length` limits.
I believe it happens when total pending requests reach the `max-num-tokens` limit. But why it's not queuing re…
-
I followed the tutorial provided [here](https://github.com/triton-inference-server/fastertransformer_backend/blob/22dba92dc1cbd367d119520013ec365b313a63ba/docs/gptj_guide.md). I am able to run GPTJ-B …
-
General catch all ticket for installation issues in this early stage of development.
-
So I was trying to deploy a custom model on the tritonserver(23.08) with the onnxruntime_backend(onnxruntime version 1.15.1). But while doing so, we are facing this issue:
```
onnx runtime error 6: …
-
### Describe the question.
I want to crop each frame of a video. I currently have a pipeline that takes JPEG images of each frame and crops them, and I am trying to convert it to take the whole video…
-
**Description**
We use tritonserver with python backend to deploy a customized stable-diffusion model running on pytorch.
We have discovered that our tritonserver has the non-determinist bug:
1. …