-
One important (and non-trivial) aspect of running model servers today is to ensure they are able to scale horizontally in response to load. Today, traditional CPU/Memory-based autoscaling are not suff…
-
### 🚀 The feature, motivation and pitch
Recently, there are larger models which couldn't be deployed on one machine, such as grok. Can we support efficient multi-node serving?
### Alternatives
_No …
-
ERROR:
`λ localhost /work/Serving/build-server-npu {v0.9.0} make TARGET=ARMV8 -j16
[ 3%] Built target extern_gflags
[ 9%] Built target extern_snappy
[ 9%] Built target extern_zlib
[ 13%] Perfo…
-
### Proposal to improve performance
I am using vllm version 0.6.3.post1 with four 4090 GPUs to infer the qwen2-72B-chat-int4 model. The request speed is very fast for a single request, but the perf…
ljwps updated
2 weeks ago
-
## 🚀 The feature
Implement support for Detectron2 models within the TorchServe object detection examples. This includes:
1. Developing a custom handler that works seamlessly with both CPU and GP…
-
Here's an overview of the features we intend to work on in the near future.
## Core Keras
### Saving & export
- Implement saving support for sharded models (sharded weights files).
- Improve…
-
### Use-cases
I would like the Databricks Terraform Provider to support the creation of a feature_spec object/function within the Unity Catalog. This is essential for serving lookup tables in online …
-
According to #2388 it should be possible to push and pull models to a Docker/OCI registry (without authentication).
Even though it's an unsupported feature, I find it very useful and would like to…
mitja updated
2 weeks ago
-
### Describe the issue
When I try to validate a bundle that deploys a model serving endpoint the CLI has a runtime error
### Steps to reproduce the behavior
Please list the steps required to repr…
-
**Describe the bug**
While following the tutorial '[Creating a custom serving runtime in KServe ModelMesh](https://developer.ibm.com/tutorials/awb-creating-custom-runtimes-in-modelmesh/)' from th…