canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

Evaluate Charmed Kubeflow with K8s 1.29 #756

Closed kimwnasptd closed 4 months ago

kimwnasptd commented 11 months ago

What needs to get done

After looking at the deprecation documents of K8s, I'll mostly focus on API deprecations https://kubernetes.io/docs/reference/using-api/deprecation-guide/ https://kubernetes.io/docs/reference/using-api/deprecation-policy/

Upstream Kubeflow supports until K8s 1.26 and that's what they tested with https://github.com/kubeflow/manifests/issues/2450#issue-1683135324

The APIs that get deprecated between 1.26 and 1.29 are:

So with those we'll go and look into the source code of the following repos for how/if they use any of those APIs:

Why it needs to get done

This can be a good feature of Charmed Kubeflow to use the latest K8s, and then we can also give feedback back to the community.

syncronize-issues-to-jira[bot] commented 11 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5046.

This message was autogenerated

kimwnasptd commented 11 months ago

First of all let me give a small overview on those 2 APIs and how are they used, so that we can have an educated guess on why/how/if they would be used by the above projects

Flow Control APIs

These APIs are meant to be used by a k8s admin to control the requests that K8s can have inflight and process. From one hand we have PriorityLevelConfigurations which break down requests to different "classes" and assign shares between different classes. The maximum number of in-flight requests is controlled with --max-requests-inflight on the kube-apiserver binary. Then the different "classes" will get a number of allowed requests, based on their shares.

Then we have FlowSchemas which control how requests are mapped to different PriorityLevelConfigurations.

As you can see those are meant for k8s admins.

CSIStorageCapacity

Quoting the docs:

CSIStorageCapacity objects: these get produced by a CSI driver in the namespace where the driver is installed. Each object contains capacity information for one storage class and defines which nodes have access to that storage.

https://kubernetes.io/docs/concepts/storage/storage-capacity/#api

So this should be used by a CSI driver


At this point we don't expect any of those APIs to be handled by Kubeflow and the projects it contains

kimwnasptd commented 11 months ago

I can confirm that our UATs are running successfully with:

  1. MicroK8s 1.29/candidate
  2. Charmed Kubeflow 1.8/stable

https://github.com/canonical/charmed-kubeflow-uats

kimwnasptd commented 11 months ago

Istio and APIs

Looking into the whole org for flowcontrol and CSIStorageCapacity keywords they are only defined in the istio/api repo, which contains protobufs for all of K8s objects

https://github.com/istio/api/tree/master/common-protos/k8s.io/api/flowcontrol https://github.com/istio/api/blob/master/common-protos/k8s.io/api/storage/v1beta1/generated.proto

But none of those are used throughout the org. Which confirms our expectations that those should not be used by Istio. We also tested Istio with K8s 1.29 with our UATs. https://github.com/search?q=org%3Aistio+flowcontrol&type=code&p=1

Upgrades

I also tried to run an upgrade with istioctl from 1.17, which is the default in CKF 1.8, to 1.18 and this worked as expected as well.

Tried this as a smoke test since this is something that we will for sure do in a follow-up release of Kubeflow

kimwnasptd commented 11 months ago

Knative and APIs

Looking into the whole org for flowcontrol and CSIStorageCapacity we only see some references of flow control in some archived repos. This confirms our understanding that Knative does not require or use any of those APIs

https://github.com/search?q=org%3Aknative%20flowcontrol&type=code

kimwnasptd commented 11 months ago

Seldon and APIs

Looking into the whole org for flowcontrol and CSIStorageCapacity we only see some references of flow control in some Gopkg.lock file, but nowhere in the code. This confirms our understanding that Knative does not require or use any of those APIs

https://github.com/search?q=org%3ASeldonIO%20flowcontrol&type=code

kimwnasptd commented 11 months ago

KServe and APIs

Looking into the whole org for flowcontrol and CSIStorageCapacity we don't find any references at all to those APIs or keywords.

This confirms our understanding that KServe does not require or use any of those APIs.

https://github.com/search?q=org%3Akserve%20flowcontrol&type=code

kimwnasptd commented 11 months ago

Kubeflow and APIs

Looking the whole org for flowcontrol and CSIStorageCapacity we find only some references of flowcontrol in the following repos:

Those repos though are archived or unmaintained. So this confirms our understanding that Kubeflow is not affected by those APIs.

https://github.com/search?q=org%3Akubeflow+flowcontrol&type=code&p=1

kimwnasptd commented 11 months ago

So at this point, and after discussing with Canonical's K8s team, our understanding is that since the projects are not affected by the deprecated APIs then we can expect Charmed Kubeflow to also work with K8s 1.29.

NohaIhab commented 11 months ago

What about the argo and dex, should we do a check there? or are they covered by one of the repos you mentioned in the checklist above?

kimwnasptd commented 11 months ago

@NohaIhab very good point! Adding this to the list as well and will confirm

kimwnasptd commented 4 months ago

Argo Workflows

After evaluating the source code of Argo and their docs, they don't expect a specific K8s version https://argo-workflows.readthedocs.io/en/stable/

They also don't use either flowcontrol or CSIStorageCapacity and UATs of CKF 1.8 were passing on K8s 1.29. So marking Argo as checked as well

kimwnasptd commented 4 months ago

Dex

Dex is also not using flowcontrol or CSIStorageCapacity anywhere in the code of 2.39. Also the UATs are passing and Dex was getting deployed with K8s. Login was also working as expected.

So marking this as checked

kimwnasptd commented 4 months ago

At this point, all the above tests were confirmed so we mark CKF as being supported by K8s 1.29