-
# Latency in OpenVINO Model Server Inside Kubernetes Cluster
## To Reproduce
**Steps to reproduce the behavior:**
1. **Prepare Models Repository**: Followed standard procedures to set up the mode…
-
This is a tracker for all the various bits we will need to track to complete the feature work to complete Model Serving Metrics - Round 2
# Requirements
Add requirements
# Individual Efforts
* UX…
-
Can you share the best practices for serving DGL models in production? (which of the frameworks is preferred/fully supported - torch serve , TensorFlow serving , Kserve or anything kubeflow based , N…
-
## Description
Unable to use open-ai endpoint, getting the error below.
### Error Message
PyProcess W-100-model-stdout: The following parameters are not supported by neuron with rolling batch: {'…
-
### 🐛 Describe the bug
I have 2 model archive files in my model store, gender_model.mar and age_model.mar. Each one of these works for inferencing individually with torchserve. Individually I start t…
-
I'd like us to move to an experience like this when serving models with torchserve to follow projects like Fast API. The benefit of this is people can integrate torchserve much more easily in their ex…
-
Envoy supports sending the full request body to the external authorization server via the with_request_body filter configuration. Do you think that it is possible to expose such feature on the Securit…
-
I'am trying to use saved model in Tensorflow Serving but without success.
**I exported model:**
```
yolo = YOLOv4()
yolo.config.parse_names("yolov4-data/coco.names")
yolo.config.parse_cfg("yolo…
-
**Description**
I am currently using triton vllm backend for my kubernetes cluster. There are 2 GPUs that Triton is able to see, however it seems to only choose GPU 0 to load the model weights
I h…
-
https://docs.google.com/presentation/d/1jxj9zjeRRu1BJSf8tzaWoQVr5h7HZbOUsSh39Rcwv80/edit#slide=id.g10be2c57ddf\_7\_3
Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-672