-
I see that Triton backend creates an [object of GptManager](https://github.com/triton-inference-server/tensorrtllm_backend/blob/bf5e9007a3f16c7fc76cb156a3362d1caae198dd/inflight_batcher_llm/src/model_…
-
hi everyone
i runing tritonserver vllm and i want runing with dynamic batching, but i encountered an error. It seems like it has something to do with my input
Inference with curl:
curl -X POST loca…
-
### **Feature Area**
/area backend
/area sdk
The examples for nvidia-resnet cannot be built using existing scripts.
### **What feature would you like to see?**
Update existing nvidia-resnet o…
-
I cannot start the whisperfile even though ffmpeg is definitely in the path. have tried running cmd as administrator as well as copying the ffmpeg.exe file to the same directory with same results ever…
-
The spatial detailing code only supports 6 region types, but the user is able to select any of the terriaJS region types. If that region type is not supported, the inference server will throw an error…
-
Looking at the release of TensorRT 9.1.0. I am very happy to see the integration of openai-triton with TensorRT plugins.
However [one limitation of this integration is that python must be availabl…
-
➜ aiac --version
aiac version 5.2.1
We are using local backends provided by huggingface TGI
```yaml
[backends.phi3]
type = "openai"
default_model = "Phi-3"
url = "https://phi3.ourcluster/…
-
## Description
Currently our example evaluation scripts require TGIS docker images to be available locally. This procedure is undocumented a bit undefined unless someone knows how to build TGIS loc…
-
We have a streaming service that uses gRPC with Unix sockets.
The gRPC performs way better with Unix socks in comparison with a TCP port. I saw that you can only change the port in the triton server…
-
I am running inference tasks conveniently with CodeGen models, thanks to the FauxPilot community. Thank you again.
Additionally I wonder if it is possible to run multiple models on a single GPU.
Bel…