canonical / knative-operators

Charmed Knative Operators
Apache License 2.0
1 stars 1 forks source link

Implement Triton integration with KNative Serving #171

Closed NohaIhab closed 3 months ago

NohaIhab commented 5 months ago

Context

Integrate Nvidia Triton for Serving, via KServe with GPUs

What needs to get done

set the required configurations to deploy an ISVC with Triton and on GPU. From #169 and #170, the configurations are: In the config-deployment ConfigMap:

In the config-features ConfigMap:

Definition of Done

PR to enable configuring Serving in Charmed Kubeflow for Nvidia Triton is merged

syncronize-issues-to-jira[bot] commented 5 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5251.

This message was autogenerated

kimwnasptd commented 4 months ago

My proposed plan for this is to handle it in 2 ways:

  1. Keep on using the current knative-local-gateway and the K8s externalName SVC for the requests to the ISVC (to avoid needing Dex cookies)
  2. Extend Knative to allow configuring Affinities and Tolerations, so that the ISVC Pod can be scheduled on the GPU node https://kserve.github.io/website/master/modelserving/nodescheduling/inferenceservicenodescheduling/

I propose to set the above Knative settings as config options in the Knative Charms. I would even propose that this is enabled as default, to have a similar UX with Notebooks.

So most of the implementation in this effort is to configure Knative for GPU node scheduling and then providing a tutorial that exposes how to create and reach an ISVC that is using a GPU

NohaIhab commented 3 months ago

based on the spec, the implementation will be this way: Note: this is the section from the spec on the chosen approach. For more details refer to spec KF084.

NohaIhab commented 3 months ago

Testing

progress-deadline