-
When it comes to fully production grade inference servers, TIS is very much optimized and open sourced. So an integration of this in dspy along with trt llm (#1094) would be some great additions.
-
Hi
Can we use this with Triton inference server model?
-
### System Info
Built tensorrtllm_backend from source using dockerfile/Dockerfile.trt_llm_backend
tensorrt_llm 0.13.0.dev2024081300
tritonserver 2.48.0
triton image: 24.07
Cuda 12.5
### Wh…
-
Hello maintainters!
In [the release note of 24.08](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-08.html#rel-24-08), there is a known issue which is
> Triton met…
-
# YOLOv8 with TensorRT & Nvidia Triton Server | VISION HONG
Intro
[https://visionhong.github.io/tools/YOLOv8-with-TensorRT-Nvidia-Triton-Server/](https://visionhong.github.io/tools/YOLOv8-with-Tenso…
-
Description of problem:
I did some experiments to measure timing performance to compare standalone inference based on a TensorRT model vs Triton serving the TensorRT model using identical input on a …
-
hi,
where can i find documentation how to build triton inference server trt-llm 24.06 for sagemaker myself so i can run it on sagemaker?
Nvidia Image i want to use: nvcr.io/nvidia/tritonserver:2…
-
Hi there, I'm trying to add Triton Server into our Yocto build for use with Deepstream. I've been able to add the Triton packages (triton-server triton-core triton-tensorrt-backend triton-client) into…
-
**Description**
If I loaded 2 model transformer and inference model, memory GPU used about 3Gi.
```
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
2207044 coreai 0 C…
-
First thing I want to say thanks so much to the author for this work!
Could we export YOLO-World to ONNX or TensorRT now?
Thank you in advance!