-
Add the support for inference services.
-
## Running SAM in the Modelzoo Universe
We have started with some efforts on integrating SAM with the bioengine / imjoy / bioimageio-colab.
I want to summarize here the overall goals, the current …
-
**Description**
A clear and concise description of what the bug is.
I am trying to use the newly introduced [triton inference server In-Process python API](https://github.com/triton-inference-server…
-
### Environment
If applicable, please include the following:
**CPU architecture:** x86_64
**CPU/Host memory size:** 440 GiB memory
### GPU properties
GPU name: A100
GPU memory size: 160G…
-
-
**Description**
Currently, if both containers for model analyzer and Triton inference server are being deployed, so while collecting data from respective sources from their metrics endpoint port, Thi…
-
**Description**
Im using a simple client inference class base on client example. My tensorRT inference with batchsize 10 with 150ms and my triton with tensorRT backend took 1100ms. This is my client:…
-
As mentioned the end of https://github.com/triton-inference-server/server/issues/6981
triton: nvcr.io/nvidia/tritonserver 23.12-py3
I have 4 GPUs, and my model is ensemble model, I don't set gp…
-
**Description**
I run the model on triton inference server and also on ORT directly. Inference time on triton inference server is 3 ms, but it is 1 ms on ORT. In addition, there isn't any communicati…
-
We have a streaming service that uses gRPC with Unix sockets.
The gRPC performs way better with Unix socks in comparison with a TCP port. I saw that you can only change the port in the triton server…