-
**Description**
I used the latest image version 24.06 because the corresponding latest version of trt has support for BF16. But when I deploy the model with trt-backend. I used perf_analyze to pressu…
-
To begin, I would like to thank the triton inference server team !
You provide us with a very convenient tool to deploy deep learning models :)
**Is your feature request related to a problem? Plea…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues.
### Is your feature request related to a problem? Please describe.
If I use moganet to train a model, and then …
-
I am trying to profile our decoupled models (python backend) with perf_analyzer, and I'm curious how the following latency metrics are calculated?
Client Send, Network+Server Send/Recv,Server Queu…
-
**Is your feature request related to a problem? Please describe.**
* Normally, we would like to set log verbose=1 for printing the request logs to stdout, like the following image:
![image](https://…
-
In order to serve with tf-serving, the model needs to be converted into savedmodel. How to convert the ckpt model into savedmodel?
-
model: baichuan1 13b
enable inflight_fused_batching
**good case post:**
`curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "t…
-
Hello,
I tried to use nvidia triton streaming configuration with pruned stateless 7 streaming model, but it seems that one input is missing to encoder "avg_cache", this seems to be added in new zip…
-
### System Info
CPU: X86_64
GPU: 4*A100 80G
TensorRT-LLM: 0.6.1
### Who can help?
@kaiyux @byshiue
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
-…
-
**Is your feature request related to a problem? Please describe.**
I aim to deploy my ASR model on a server that will receive audio packet bytes with each request. The server will then transcribe the…