Closed NohaIhab closed 3 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5251.
This message was autogenerated
My proposed plan for this is to handle it in 2 ways:
knative-local-gateway
and the K8s externalName SVC for the requests to the ISVC (to avoid needing Dex cookies)I propose to set the above Knative settings as config options in the Knative Charms. I would even propose that this is enabled as default, to have a similar UX with Notebooks.
So most of the implementation in this effort is to configure Knative for GPU node scheduling and then providing a tutorial that exposes how to create and reach an ISVC that is using a GPU
based on the spec, the implementation will be this way:
Note: this is the section from the spec on the chosen approach. For more details refer to spec KF084
.
nvcr.io
registry. The config.yaml will be:
options:
registries-skip-tag-resolution:
default: "nvcr.io"
description: Comma-seperated list of repositories for which tag to digest resolving should be skipped.
type: string
options:
progress-deadline:
default: "600s"
description: the duration to wait for the deployment to be ready before considering it failed.
type: string
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: {{ app_name }}
namespace: {{ serving_namespace }}
spec:
version: {{ serving_version }}
config:
deployment:
progress-deadline: {{ progress_deadline}}
registries-skipping-tag-resolving: {{ registries_skip_tag_resolving }}
features:
kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"
progress-deadline
<ISVC name>
serving.kserve.io/inferenceservice: <ISVC name>
.spec
: progressDeadlineSeconds: <config value>
podspec-[nodeselector, affinity, tolerations]
<ISVC name>
with nodeselector, affinity, and tolerations set, example:
kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
nodeSelector:
myLabel1: "true"
tolerations:
- key: "myTaint1"
operator: "Equal"
value: "true"
effect: "NoSchedule"
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
serving.kserve.io/inferenceservice: <ISVC name>
.spec
: on affinity
, nodeSelector
, tolerations
Context
Integrate Nvidia Triton for Serving, via KServe with GPUs
What needs to get done
set the required configurations to deploy an ISVC with Triton and on GPU. From #169 and #170, the configurations are: In the
config-deployment
ConfigMap:progress-deadline
set to"600s"
registries-skipping-tag-resolving
set to"nvcr.io"
In the
config-features
ConfigMap:kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"
Definition of Done
PR to enable configuring Serving in Charmed Kubeflow for Nvidia Triton is merged