-
Only having support for ray for distributed inference will significantly reduce adoption of this tool if it truly is more performant than TGI. TGI can be run as a black-box image on Kubernetes with su…
-
One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not suff…
-
Me again.
Stews made in a cauldron only produce one bowl, the rest of the stew visibly in the cauldron is not accessible.
-
### Your current environment
vllm-openai/v06.3.1.post-1
### Model Input Dumps
a_request: None, prompt_adapter_request: None.
2024-10-27 23:04:39 INFO 10-27 09:04:39 engine.py:290] Added request ch…
-
Title basically says it, I have trained a model using HorovodAllToAllEmbeddings and saved by doingg:
```
de.keras.models.de_save_model(
model,
export_dir,
overwrit…
-
### Bug Description
While working on `net-istio-webhook` extension rock for knative we had encountered a problem where we can't run rocks in `securityContext.runAsNonRoot`: `true` Kubernetes deploym…
-
I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using t…
-
**Describe the bug**
kafka-source-dispatcher statefulset object is not able to spin up the new pods. It gets deleted immediately after it is first provisioned. Here is the result of kubectl descr…
-
### Description
If you try to add logging middleware without excluded /static route, then you will get the following error
```
Traceback (most recent call last):
File "/workdir/.venv/lib/pyt…
-
### Summary
- If you enter a serving in chinese, i.e. 38公克, it will not be recognized. And no values per serving are calculated.
### Steps to reproduce
- Check out https://world.openfoodfacts.org/pro…