-
### System Info
- tensorrtllm_backend built using Dockerfile.trt_llm_backend
- main branch tesnorrt llm (0.13.0.dev20240813000)
- 8xH100 SXM
- Driver Version: 535.129.03
- CUDA Version: 12.5
…
-
How can i loop over a set of detection results to run classifier over detected regions using mediapipe graph. It will be helpful if i get a graph example using openvino inference calculator
-
Triton Inference server restart everytime I hit the `/infer` endpoint. I am usin Kserve to deploy model on K8s.
**Input :**
`
curl --location 'https:///v2/models/dali/infer' \
--header 'Conten…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I am using Qwen2VL and have deployed an online server. Does it support online …
-
I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says
> Internal: onnx runtime error 9: Could n…
-
To optimize response times and reduce API costs for Puter (especially if we [increase context limits](https://github.com/HeyPuter/puter/issues/773)), could we implement a server-side caching mechanism…
-
I followed the steps in https://github.com/pytorch/torchrec/tree/main/torchrec/inference to test inference. But in 4. Build inference library and example server, the Build server and C++ protobufs fa…
-
I want to deploy triton + tensorrtllm, due to some constraints I cannot use docker container. I have figured out that I need to build the following repos:
1. https://github.com/triton-inference-server…
-
We have added support for returning the result's from `KibanaResponseFactory`. This works well with our inference when using the `ok` function since we can unwrap the object we pass back.
But when us…
-
/kind bug
**What steps did you take and what happened:**
[A clear and concise description of what the bug is.]
### Blocking Inferences
This first bit is not really an issue but I wanted to c…