This repository hosts the Kubernetes Python Operators for KServe (see CharmHub).
Upstream documentation can be found at https://kserve.github.io/website/0.8/
Kubernetes cluster
NOTE: If you are using Microk8s, it is assumed you have run
microk8s enable dns storage rbac metallb:"10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111"
.
istio-pilot
and istio-ingressgateway
. See "Deploy dependencies" for deploy instructions.
MODEL_NAME="kserve"
DEFAULT_GATEWAY="kserve-gateway"
juju add-model ${MODEL_NAME}
kserve-operators require istio-operators to be deployed in the cluster. To correctly configure them, you can:
ISTIO_CHANNEL=1.16/stable
juju deploy istio-pilot --config default-gateway=${DEFAULT_GATEWAY} --channel ${ISTIO_CHANNEL} --trust
juju deploy istio-gateway istio-ingressgateway --config kind="ingress" --channel ${ISTIO_CHANNEL} --trust
juju relate istio-pilot istio-ingressgateway
For serverless operations kserve-operators depends on knative-serving. To correctly configure it, you can:
NOTE: these instructions assume you have deployed Microk8s and MetalLB is enabled. If your cloud configuration is different than this, please refer to knative-operators documentation.
KNATIVE_CHANNEL=1.8/stable
juju deploy knative-operator --channel ${KNATIVE_CHANNEL} --trust
juju deploy knative-serving --config namespace="knative-serving" --config istio.gateway.namespace=${MODEL_NAME} --config istio.gateway.name=${DEFAULT_GATEWAY} --channel ${KNATIVE_CHANNEL} --trust
RawDeployment
modekserve-operators support RawDeployment
mode to manage InferenceService
, which removes the KNative dependency and unlocks some of its limitations, like mounting multiple volumes. Please note this mode is not loaded with serverless capabilities, for that you'd need to deploy in Serverless
mode.
kserver-controller
juju deploy kserve-controller --channel <channel> --trust
kserve-controller
and istio-pilot
juju relate istio-pilot:gateway-info kserve-controller:ingress-gateway
channel
is the available channels of the Charmed KServe:
- latest/edge
- 0.10/stable
Serverless
modekserve-operatos support Serveless
mode to manage event driven InferenceService
s, which enables autoscaling on demand, and supports scaling down to zero.
kserver-controller
juju deploy kserve-controller --channel <channel> --config deployment-mode="serverless" --trust
kserve-controller
and istio-pilot
juju relate istio-pilot:gateway-info kserve-controller:ingress-gateway
kserve-controller
and knative-serving
juju relate kserve-controller:local-gateway knative-serving:local-gateway
InferenceService
To deploy a simple example of an InferenceServer
, you can use the one provided in examples/
NOTE: this example is based on First InferenceService
InferenceService
in a testing namespaceUSER_NS="kserve-testing"
kubectl create ns ${USER_NS}
kubectl apply -f sklearn-iris.yaml -n${USER_NS}
InferenceService
statuskubectl get inferenceservices sklearn-iris -n${USER_NS}
ClusterIP
NOTE: this method can only be used for performing inference within the cluster.
SERVICE_IP=$(kubectl get svc sklearn-iris-predictor-default -n${USER_NS} -ojsonpath='{.spec.clusterIP}')
INFERENCE_URL="${SERVICE_IP}/v1/models/sklearn-iris:predict"
InferenceService
URLkubectl get inferenceservice sklearn-iris -n${USER_NS}
# From the output, take the URL
INFERENCE_URL="${URL}/v1/models/sklearn-iris:predict"
Create a file with the input request:
cat <<EOF > "./iris-input.json"
{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
EOF
Now call the InferenceService
:
curl -v $INFERENCE_URL -d @iris-input.json
Expected output:
{"predictions": [1, 1]}
Canonical Charmed Kubeflow is a state of the art, fully supported MLOps platform that helps data scientists collaborate on AI innovation on any cloud from concept to production, offered by Canonical - the publishers of Ubuntu.
Charmed Kubeflow is free to use: the solution can be deployed in any environment without constraints, paywall or restricted features. Data labs and MLOps teams only need to train their data scientists and engineers once to work consistently and efficiently on any cloud – or on-premise.
Charmed Kubeflow offers a centralised, browser-based MLOps platform that runs on any conformant Kubernetes – offering enhanced productivity, improved governance and reducing the risks associated with shadow IT.
Learn more about deploying and using Charmed Kubeflow at https://charmed-kubeflow.io.
Please see the official docs site for complete documentation of the Charmed Kubeflow distribution.
KServe controller comes with a set of preconfigured images that are used in Kserve workloads. The default images are listed in default-custom-images.json
These images can be overridden in the charm configuration under custom_images in the charms/kserve-controller/config.yaml file. Whenever you leave the custom_images field empty in the config, the default images will be used (listed above). You can specify your own images with the config by filling one or multiple entries. The config accepts either YAML or JSON entries. For example.
juju config kserve-controller custom_images='{"configmap__agent": "custom:1.0", "serving_runtimes__lgbserver": "cuustom:2.1"}'
These images are being used in .j2 files under charms/kserve-controller/src/templates/.j2.
If you find a bug in our operator or want to request a specific feature, please file a bug here: https://github.com/canonical/dex-auth-operator/issues
Charmed Kubeflow is free software, distributed under the Apache Software License, version 2.0.
Canonical welcomes contributions to Charmed Kubeflow. Please check out our contributor agreement if you're interested in contributing to the distribution.
Security issues in Charmed Kubeflow can be reported through LaunchPad. Please do not file GitHub issues about security issues.