canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

Kubeflow Dashboard cannot render its component functions on microk8s #419

Closed Theocrat closed 2 years ago

Theocrat commented 2 years ago

I have installed Kubernetes on Ubuntu 18.04 using the MicroK8s package. My intention was to run Kubeflow on my bare-metal on-premise system (a laptop) to familiarize myself with the tool before shilling out credits to use it on the cloud. However, I have run into a difficulty.

First of all, Kubeflow does not work on MicroK8s 1.22 - so I am using the 1.21/stable channel. On this system, I have enabled the kubeflow package using microk8s.enable kubeflow - alongside dns, dashboard and storage. This has installed several images into my system, and microK8 now runs several pods.

Of these, one of the pods won't run. The image kfp-viz - corresponding to a Juju Charmer - cannot be installed. Here is the error:

Failed to pull image "registry.jujucharms.com/kubeflow-charmers/kfp-viz/oci-image@sha256:c90a5818043da47448c4230953b265a66877bd143e4bdd991f762cf47e2a16d6": rpc error: code = FailedPrecondition desc = failed to pull and unpack image "registry.jujucharms.com/kubeflow-charmers/kfp-viz/oci-image@sha256:c90a5818043da47448c4230953b265a66877bd143e4bdd991f762cf47e2a16d6": failed commit on ref "layer-sha256:bf94fddbd6a293bbdfb71d2d627b2262f0f1296b198a1d372ca30253a460d8b0": "layer-sha256:bf94fddbd6a293bbdfb71d2d627b2262f0f1296b198a1d372ca30253a460d8b0" failed size validation: 116013328 != 147329385: failed precondition

Owing to this error, the image pull is backed off. It has failed consistently. I do not know whether this is what is causing the main issue, but the real problem is that the kubeflow dashboard is disfunctional. I am going to attach an image here for reference:

this image

Basically, none of the tabs work. If I try to create a new notebook, it tells me that the /jupyter/new page is not valid. Same, apparently, is true for the /pipeline/ page, and for every other page.

For reference, this is the method I used to run the dashboard:

  1. I ran the command `kubectl -n kubeflow get services | grep "kubeflow-dashboard "
  2. I noted the IP address there.
  3. I connected to port 8082 on this IP address using HTTP (no secure).

If I am doing something wrong, It would be great if someone explained it to me. I think I have messed up somewhere, since nobody else seems to be facing this issue while using MicroK8s.

EDIT:

I tried disabling and re-enabling kubeflow. I read the installation instructions carefully, and noted that the link for opening kubeflow-dashboard on my browser is provided at the end of the installation process. Unfortunately, this process is never finished, since the kfp-viz pod will not run owing to its ImagePull error, which causes it to be backed off.

I have tried installing kfp-viz from other sources, and there doesn't seem to be any source from which this can be accomplished on Ubuntu 18.04; On the JAAS store, the image is only available for Ubuntu 20.04

People were clearly installing Kubeflow on Bionic at some point, given the number of blogs where instructions to do so are provided with a confidently assertive tone. It is clearly harder now, and for some reason the latest version of MicroK8s does not support Kubeflow. This seems to be telling a story. I wonder what happened.

Further Edit:

Okay, so the pod image kfp-viz was successfully downloaded and installed a moment back, leaving me thoroughly astounded. But this success has not changed the nature of the Dashboard, which is still as dysfunctional as it had been earlier.

Theocrat commented 2 years ago

Okay, kinda solved I guess

Look, I am not saying that this problem has been solved. If you are working on MicroK8s and you came here to receive guidance on making Kubeflow run on an older version of MicroK8s, you will be disapppointed. If you are simply looking for some way to make Kubeflow work on Kubernetes on your personal testing server - or laptop, for that matter - then you might just be in the right place.

Hardware requirements

Kubeflow on an on-premise Kubernetes requires some 8-10 GiB to run, and that is when you are not even training the model. So basically, if you want to run Kubeflow on your premise, you need one helluva premise.

Platform

Do not use Canonical's MicroK8s. Instead, install Kubectl, Kind and Kustomize separately as suggested by the official Kubernetes documentation. First, download Kubectl using this line of code:

     curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

Then install Kustomize:

    curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash

And finally, install Kind:

    curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-linux-amd64

Place all of them inside a $PATH location, and one that can be used as path by the root user. Then become root, and generate a Kind cluster by using

    kind create cluster

Installing Kubeflow:

Use git to clone the repository for on-premise custom Kubeflow. All existing guidebooks tell you how to set it up on Google Cloud Platform or Amazon Web Services, but fortunately humanity is not dead and there is one repository that contains the manifest for setting Kubeflow up on premise:

    git clone https://github.com/kubeflow/manifests.git

And then run this command as root inside the cloned directory, as suggested by the maintainers of the manifest:

    while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

And that is it. This will take something like an hour or two, but you will have your Kubeflow cluster set up. You can use

    kubectl get pods -A

as root to check the status of the cloud from time to time. Remember, the pod cluster is huge and chugs a massive chunk of RAM.

Connect to your Kubeflow installation by running:

    kubectl port-forward -n istio-system svc/istio-ingressgateway 8000:80

as root user and then connecting to localhost:8000. DO NOT connect directly to Kubeflow Central Dashboard. Log in to the ISTIO gateway port-forwarded by the command above above by using the username user@example.com and the password 12341234.

Why I would advise not using MicroK8s for Kubeflow

MicroK8s neither supports Kubeflow itself, nor is it supported by Kubeflow and Juju any more.

Not only does the latest version of MicroK8s (v1.22) not support Kubeflow, but the older versions (v1.21 for instance) are dropped by Google's buddies. MicroK8s v1.21 attempts to pull images for the pods used by Kubeflow using Juju, an image fetching service. This may sometimes succeed once in a blue moon, but normally there are all kinds of errors.

Sometimes, the image pull fails because the YAML manifest files on the pulled images list image sizes that do not match the actual binary size of the image. Sometimes there is a version mismatch. And sometimes, Juju returns HTTP 403 and tells you to go do something else.

Edit

Thanks to @DomFleischmann for posting a nice link to a guide for installing Charmed Kubeflow using Juju in his comment below.

DomFleischmann commented 2 years ago

Hello @Theocrat, I'm sorry we couldn't adress your issue earlier, sometimes issues aren't adress as quickly as they should specially considering the team coming back from holidays and a weekend being inbetween these 5 days you mention.

Kubeflow not working on 1.22 is an upstream problem that affects all the Kubeflow distributions. We do still support Microk8s but are slowly encouraging our users to start using the steps mentioned here.

We have seen several issues with the microk8s installation on ubuntu 18.04, as you correctly noticed in general we recommend using 20.04 for any current installations.

Theocrat commented 2 years ago

@DomFleischmann I can understand. I think I have gone a little overboard on the flame. Two cups of black coffee on an empty stomach in the early morning will do that to you. I have had the opportunity to rethink the situation, and I think the vile tempest of accusations in my earlier post was uncalled for.

Of course, this is family time for you guys and I wish you a happy new year. Thank you for posting a link to your guide for users of MicroK8s to install Charmed Kubeflow using Juju. I have removed the offending portion from my earlier post.

I'll remember to take a bite on the Snickers before commenting the next time :)

Theocrat commented 2 years ago

@DomFleischmann Hey Dom, I have been trying out my use case on the Charmed Kubeflow solution from Juju, but it appears that the charmed Kubeflow solution is not updated. The tutorials, including the one you have shared with me, appear to be similarly behind the times.

For one thing, Kubeflow as it stands today (in February 2022) can only be properly accessed via the ISTIO Ingress Gateway. Attempting to access the Kubeflow Dashboard directly will yield a useless pretense of the UI, which basically spams more copies of itself whenever you click on any of the tabs in the column to the left.

However, the ISTIO load balancing service in Juju's charmed Kubeflow is oddly broken. For one thing, the istio-ingressgateway-operator pod remains indefinitely in a waiting state, awaiting the ISTIO pilot. This issue has been referred to in this Stackoverflow exchange, where one of the users provides a solution candidate. I have tried this solution, and while it has fixed the waiting problem, there still remain other issues.

For another, when I run

kubectl -n kubeflow get svc

I observe this line for the load balancer:

NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                                                                                   AGE

istio-ingressgateway                 LoadBalancer   10.96.141.148   <pending>     15020:32198/TCP,80:30075/TCP,443:30375/TCP,15029:31903/TCP,15030:31682/TCP,15031:32328/TCP,15032:31773/TCP,15443:32640/TCP,15011:32065/TCP,8060:31671/TCP,853:31981/TCP   20h

Notice that the external IP column shows a state of <pending>. This is troublesome, and ultimately does lead to trouble.

When I ultimately use

kubectl -n kubeflow port-forward svc/istio-ingressgateway 8000:80

to access the load balancer, the port 8000 yields an empty blank page. The ISTIO system is clearly dysfunctional, and trying to directly get to the dashboard yields a dysfunctional dashboard.

DomFleischmann commented 2 years ago

Hello @Theocrat, regarding the command that moves istio-ingressgateway away from waiting state it is clearly stated in Step 6 of the link I provided to you.

Regarding the external-ip which seems to be remaining as <pending> this is probably due to your Kubernetes cluster not having a Loadbalancing service. You can either deploy something like metallb or use a k8s that comes with one out of the box like microk8s. Another alternative is to patch the istio-ingressgateway service to use NodePort instead of Loadbalancer.

All of Kubeflows User Interfaces are very tightly coupled with istio and require to be accessed through it, this is not something where Charmed Kubeflow is not updated but a design decision from the upstream project.

Theocrat commented 2 years ago

Right, thanks again.