-
The idea here is to use the Triton Inference Server to perform Inferences via MIGraphX.
The first issue to tackle is to enable it without the official docker, and use a rocm based.
The next would be…
-
### System Info
Iam refferng to [https://github.com/huggingface/text-generation-inference?tab=readme-ov-file#local-install](https://github.com/huggingface/text-generation-inference?tab=readme-ov-fil…
-
## User story
As a customer,
I want to launch an app implementing Triton Inference Server
In order to
deploy my models in production with optimisation and high availability.
## Acceptance …
mhrng updated
11 months ago
-
tritonclient for Python uses _registered_method, which was added in 1.63.0 so [tritonclient's deps](https://github.com/triton-inference-server/client/blob/cb9ba08b3f88dff802485f0577b008cdbf41c529/src/…
-
**Description**
If I loaded 2 model transformer and inference model, memory GPU used about 3Gi.
```
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
2207044 coreai 0 C…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
## Bug description
The error occurs when the LLM Server suddenly stops, and the chat-ui continues to send queries to the LLM Server, eventually leading to the chat-ui also crashing. The specific e…
-
@wangg12 @shanice-l @Rainbowend @tzsombor95 need your help.
The inference script runs successfully without any errors when executed as a standalone Python script. By when running with ros2, ie., …
-
### Description
The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request co…
-
I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions [here]( https://docs.mlcommons.org/inference/benchmarks/recommendation/dlrm-v2/#__tabbed_15_1).
…