-
### 🚀 The feature, motivation and pitch
Hi, I'm currently working on **deploying vLLM distributed on multi-node in k8s cluster**. I saw that the official documentation provided a link by using [LWS…
-
### Describe the problem
From what I know it's not possible to hook into the serving of build artifacts in the `_app` folder. The biggest use case I have for it is to be able to hook into it to preve…
-
### Is there an existing issue that is already proposing this?
- [X] I have searched the existing issues
### Is your feature request related to a problem? Please describe it
There is no way to disa…
-
**What would you like to be added**:
Similar to kserve https://kserve.github.io/website/latest/modelserving/v1beta1/custom/custom_model/#parallel-model-inference
**Why is this needed**:
*…
-
**Describe the bug**:
After we updated cert-manager to `v1.15.0`, we started to see the following error in the logs of cert-manager webhook pod:
`"Failed to generate serving certificate, retry…
-
Issue: SOAP Plugin returning mock example instead of the defined response when using shared folder mapping
I am encountering an issue with the Imposter SOAP plugin when mapping a shared folder for …
-
**Is your feature request related to a problem? Please describe.**
Tried to run custom 40B model, whose weights can be loaded with 2 80GB GPU's VRAM.
lmcache is able to load small models with in sin…
-
One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not suff…
-
### Your current environment
vllm-openai/v06.3.1.post-1
### Model Input Dumps
a_request: None, prompt_adapter_request: None.
2024-10-27 23:04:39 INFO 10-27 09:04:39 engine.py:290] Added request ch…
-
### Bug Description
While working on `net-istio-webhook` extension rock for knative we had encountered a problem where we can't run rocks in `securityContext.runAsNonRoot`: `true` Kubernetes deploym…