-
Hi community,
I am wondering any specific optimization did in kserve to support LLM applications? Is there a feature list?
-
### Describe the bug
We encountered a bug when serving Xinference (since 0.8.5 & through docker) through an apache vhost with reverse proxy configuration. Issue is the dynamic resolution of paths (th…
-
Currently, every GitHub project and specially the ones that come under CNCF use independent processes for issue triage, bot replies and so on. At a broad level, the following patterns arise where proj…
-
Hello, I'm using 24.03-trtllm-python-py3 with image size 8.38 GB which is not small but ok.
I'm going to migrate to the newest versions like 24.04 or 24.05 but it size drastically increased to 18.46 …
Prots updated
4 weeks ago
-
Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…
-
KServe is a community driven open source project, aiming to deliver a cloud-native, scalable, extensible serverless ML inference platform. It provides an open standard control and data plane for servi…
-
Has anyone had any success serving llms through the 0.5.0 docker image?
I have followed the following steps:
`cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}`
`docker run -it --gpus all --shm-size …
-
Perhaps it's user error but I can't pass a custom OpenAI `base_url` to redirect the requests to a Databricks serving endpoint. This would be ideal for using {chattr} to interact with Databricks [found…
-
### Your current environment
docker with vllm/vllm-openai:v0.4.3 (latest)
### 🐛 Describe the bug
python3 -m vllm.entrypoints.openai.api_server --model ./Qwen1.5-72B-Chat/ --max-model-len 2400…
-
**Setup**
Machine: AWS Sagemaker ml.p4d.24xlarge
Model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Used Docker container image with the latest build of trt-llm (`0.8.0.dev2024011…