-
### Environment
If applicable, please include the following:
**CPU architecture:** x86_64
**CPU/Host memory size:** 440 GiB memory
### GPU properties
GPU name: A100
GPU memory size: 160G…
-
### Description
```shell
The Docker built fine using the older version mentioned in readme (22.12), but when trying to build using the latest docker (23.05) it fails.
See this log file: https://gi…
-
**Description**
I deployed Triton Inference Server on Kubernetes (GKE). To balance the load, I created a Load Balancer Service. As a client, I'm using the Python HTTP client. I was expecting all the …
-
**Is your feature request related to a problem? Please describe.**
I'd like to be able to run vLLM emulating the OpenAI compatible API to use vLLM as a drop-in replacement of ChatGPT.
**Describe…
-
Hey,
I tried to do ColBERT model inferencing via Triton server in multiple GPUs instance.
GPU 0 works fine. However, other GPU devices (1,2,3,... etc) crash when running to this line
```D_pac…
-
**Description**
A clear and concise description of what the bug is.
I am trying to use the newly introduced [triton inference server In-Process python API](https://github.com/triton-inference-server…
-
(moving from https://github.com/cms-sw/cmssw/issues/37738#issuecomment-1114455507)
The workflow 10805.31 step 3 fails with
```
Starting python2 /data/cmsbld/jenkins/workspace/ib-run-relvals/cms-b…
-
there is an example https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwenvl , but I have no idea how can I use this model in triton server, Can you provide an example of a visual language mod…
-
I think most of the dependencies that get installed with `pip install whisper-live` are only needed for the server, not the client. How can I use the client without installing all the server's package…
-
Hi @xhp-hust-2018-2011 ,
Thanks for the great work done on this repo. I'm trying to use your prebuilt Pytorch model with [NVIDIA's Triton Inference Server](https://docs.nvidia.com/deeplearning/sdk/…