-
KFServing has released 0.1.0 version, we are looking to integrate with arena.
-
Hi!
I think, in current implementation the engine cannot server models if they have less than five outputs. The [classify](https://github.com/NVIDIA/gpu-rest-engine/blob/d8d2255884f965b2feca855cb9e18…
-
Hello! I use this simulator for LLM serving, but when I run the following cmd:
```shell
python3 -u main.py --model_name 'gpt3-6.7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gp…
-
Would be nice having a new parameter in the `InferenceService` CRD that allows user to specify the model size (the size in bytes), avoiding the `MODEL_MULTIPLIER` factor to estimate the size.
**Is …
-
/kind feature
**Describe the solution you'd like**
Is there any config to modify the imagePullPolicy of queue-proxy? This question has stoned me for a long time and I've read docs of kserve & knat…
-
-----------------------
## Feature Request
### Describe the problem the feature is intended to solve
TensorFlow is promoting Apples M1 Macs, would be great to have TFServing running on M1 Macs as…
-
ONNX export (e.g. with https://onnx.ai/sklearn-onnx/ ) would be very beneficial for deploying trained models to any environment and programming language. Do you have such export options considering ON…
-
### Summary
### Steps to Reproduce
1. Deploy the latest `incubation` of odh-operator sources using manifests from [here](https://github.com/opendatahub-io/opendatahub-operator/blob/d4ba37e4b041977…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
I used the openai compatible server deployed with vllm:
```bash
python -m vllm.entr…
-
## Description
djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1
models:
- meta-llama/Llama-2-7b-chat see: https://huggingface.co/meta-llama/Llama-2-7b-chat (used this report)
- meta-lla…