-
## Bug Description
I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial
https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_tā¦
-
### Describe the feature
Mosec serving model sometimes will be a intermediate representations for large model pipeline.
So compression support could be important.
### Why do you need this feature?
ā¦
-
I am encountering an issue with evaluating Bitsandbytes 4-bit and 8-bit quantized models on the Berkeley Function Call Leaderboard (BFCL). I have successfully quantized my models using Bitsandbytes anā¦
-
### Bug Description
While working on `net-istio-webhook` extension rock for knative we had encountered a problem where we can't run rocks in `securityContext.runAsNonRoot`: `true` Kubernetes deploymā¦
-
"The file serving/model_request_processor.py imports torch, but torch is missing from serving/requirements.txt"
-
### Main idea
Since the model service executes its service based on `model-definition.yml` file, we need to provide an editable UI for the YAML file. With YAML editor, user can edit freely and noticeā¦
-
**Describe the bug**
The [quickstart install](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md#run-the-installation-script) instructions no longer work correctly. After deplā¦
-
Open this issue for tracking the progress of models supported in candle-vllm.
-
We support lws as the default workload, however, most of the cases mutli-hosts is not needed, even with Llama3.1 405B. So maybe this is a better choice.
-
**Is your feature request related to a problem? Please describe.**
Extend the training parameters to allow for flags or a different cli option to be provide to allow for distributed training to be peā¦