Closed didier-durand closed 3 years ago
Hi @didier-durand
i see some pods evicted. This usually happens because there not enough resources disk/memory. The inspection tarball might tell us more on what might be happening.
Attached is the inspection report taken after approx 6 minutes of install:
inspection-report-20201030_142237.tar.gz
The machine is a Google Cloud GCE instance: n2-standard-8: 8 cores - 32 GB.
Didier
@didier-durand: I see in the inspection report that the node has disk pressure. You might need to up the disk space, I would recommend a minimum of 80GB to be sure that it works.
@knkski :
That was the issue I raised my boot disk space to 100 GB and it went through for the operator pods. Thanks for that! But, now it stops on a new issue after all 30 operator pods got started (full trace below)
Waiting for service pods to become ready. Kubeflow could not be enabled: Error from server (NotFound): mutatingwebhookconfigurations.admissionregistration.k8s.io "katib-mutating-webhook-config" not found Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "katib-validating-webhook-config" not found
Can you please further help? Thanks.
DIdier
Enabling dns... Enabling storage... Enabling dashboard... Enabling ingress... Enabling metallb:10.64.140.43-10.64.140.49... Waiting for DNS and storage plugins to finish setting up Deploying Kubeflow... Kubeflow deployed. Waiting for operator pods to become ready. Waited 0s for operator pods to come up, 30 remaining. Waited 15s for operator pods to come up, 29 remaining. Waited 30s for operator pods to come up, 29 remaining. Waited 45s for operator pods to come up, 28 remaining. Waited 60s for operator pods to come up, 28 remaining. Waited 75s for operator pods to come up, 26 remaining. Waited 90s for operator pods to come up, 24 remaining. Waited 105s for operator pods to come up, 20 remaining. Waited 120s for operator pods to come up, 20 remaining. Waited 135s for operator pods to come up, 17 remaining. Waited 150s for operator pods to come up, 16 remaining. Waited 165s for operator pods to come up, 16 remaining. Waited 180s for operator pods to come up, 15 remaining. Waited 195s for operator pods to come up, 13 remaining. Waited 210s for operator pods to come up, 11 remaining. Waited 225s for operator pods to come up, 11 remaining. Waited 240s for operator pods to come up, 9 remaining. Waited 255s for operator pods to come up, 7 remaining. Waited 270s for operator pods to come up, 7 remaining. Waited 285s for operator pods to come up, 7 remaining. Waited 300s for operator pods to come up, 7 remaining. Waited 315s for operator pods to come up, 7 remaining. Waited 330s for operator pods to come up, 6 remaining. Waited 345s for operator pods to come up, 5 remaining. Waited 360s for operator pods to come up, 4 remaining. Waited 375s for operator pods to come up, 3 remaining. Waited 390s for operator pods to come up, 1 remaining. Waited 405s for operator pods to come up, 1 remaining. Waited 420s for operator pods to come up, 1 remaining. Operator pods ready. Waiting for service pods to become ready. Kubeflow could not be enabled: Error from server (NotFound): mutatingwebhookconfigurations.admissionregistration.k8s.io "katib-mutating-webhook-config" not found Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "katib-validating-webhook-config" not found
Command '('microk8s-kubectl.wrapper', 'delete', 'mutatingwebhookconfigurations/katib-mutating-webhook-config', 'validatingwebhookconfigurations/katib-validating-webhook-config')' returned non-zero exit status 1 Failed to enable kubeflow
Did we find any solution for Error from server (NotFound): mutatingwebhookconfigurations.admissionregistration.k8s.io "katib-mutating-webhook-config" not found
Even im facing same issue
@knkski : please, let me know if I could supply additional to help your analysis of the cause. Thanks! Didier
@didier-durand: sorry about the slow response. #1635 should fix this issue, and ensure that it doesn't happen again. It will be included in 1.20/stable
, or you can try out it before that by switching microk8s to the latest/edge
(1.20) or latest/beta
(1.19.5) channels.
I'm going to close this, since it should now be fixed, but feel free to reopen if you encounter the issue again.
@knkski : Thanks. No issue regarding delay. I'll test and come to tell you if fixed or not in my own Github workflow. Didier
Hi there,
after tests, I can confirm that microk8s enable kubeflow
now runs successfully with a fresh Ubuntu install on a Google Cloud GCE instance n2-standard-8
with 250 GB hard disk.
It just takes some time for the 30+ operator pods (list below) to get ready: see below. Over 12 min to come up and get ready: 7min30s for the pods to come up and get ready. Then, 4min30s to get Congratulations, Kubeflow is now available.
snap is installed from latest/edge: see below.
GCE image: ubuntu-2004-focal-v20201111 - image family: ubuntu-2004-lts - image project: ubuntu-os-cloud
@ktsakalozos , @knkski : thanks for your support.
Didier
ddurand@microk8s-kubeflow:~$ snap list
Name Version Rev Tracking Publisher Notes
core 16-2.47.1 10185 latest/stable canonical✓ core
core18 20200929 1932 latest/stable canonical✓ base
google-cloud-sdk 318.0.0 159 latest/stable/… google-cloud-sdk✓ classic
lxd 4.0.4 18150 4.0/stable/… canonical✓ -
microk8s v1.19.4 1826 latest/edge canonical✓ classic
snapd 2.47.1 9721 latest/stable canonical✓ snapd
ddurand@microk8s-kubeflow:~$ microk8s enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling metallb:10.64.140.43-10.64.140.49...
Waiting for DNS and storage plugins to finish setting up
Bootstrapping...
Bootstrap complete.
Successfully bootstrapped, deploying...
Kubeflow deployed.
Waiting for operator pods to become ready.
Waited 0s for operator pods to come up, 31 remaining.
Waited 15s for operator pods to come up, 31 remaining.
Waited 30s for operator pods to come up, 31 remaining.
Waited 45s for operator pods to come up, 31 remaining.
Waited 60s for operator pods to come up, 30 remaining.
Waited 75s for operator pods to come up, 29 remaining.
Waited 90s for operator pods to come up, 28 remaining.
Waited 105s for operator pods to come up, 28 remaining.
Waited 120s for operator pods to come up, 27 remaining.
Waited 135s for operator pods to come up, 27 remaining.
Waited 150s for operator pods to come up, 27 remaining.
Waited 165s for operator pods to come up, 25 remaining.
Waited 180s for operator pods to come up, 21 remaining.
Waited 195s for operator pods to come up, 20 remaining.
Waited 210s for operator pods to come up, 20 remaining.
Waited 225s for operator pods to come up, 19 remaining.
Waited 240s for operator pods to come up, 18 remaining.
Waited 255s for operator pods to come up, 17 remaining.
Waited 270s for operator pods to come up, 14 remaining.
Waited 285s for operator pods to come up, 14 remaining.
Waited 300s for operator pods to come up, 14 remaining.
Waited 315s for operator pods to come up, 14 remaining.
Waited 330s for operator pods to come up, 14 remaining.
Waited 345s for operator pods to come up, 14 remaining.
Waited 360s for operator pods to come up, 14 remaining.
Waited 375s for operator pods to come up, 13 remaining.
Waited 390s for operator pods to come up, 11 remaining.
Waited 405s for operator pods to come up, 9 remaining.
Waited 420s for operator pods to come up, 3 remaining.
Waited 435s for operator pods to come up, 3 remaining.
Waited 450s for operator pods to come up, 2 remaining.
Operator pods ready.
Waiting for service pods to become ready.
Congratulations, Kubeflow is now available.
The dashboard is available at http://localhost
Username: admin
Password: 2CDOKXARFGPIGKP9GI1UZKN1GRI2KR
To see these values again, run:
microk8s juju config dex-auth static-username
microk8s juju config dex-auth static-password
To tear down Kubeflow and associated infrastructure, run:
microk8s disable kubeflow
ddurand@microk8s-kubeflow:~$ microk8s kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-r8vrx 1/1 Running 1 19m
kube-system coredns-86f78bb79c-7czsv 1/1 Running 0 17m
kube-system hostpath-provisioner-5c65fbdb4f-2sr75 1/1 Running 0 17m
kube-system calico-kube-controllers-847c8c99d-pk4tg 1/1 Running 0 19m
kube-system metrics-server-8bbfb4bdb-ks6h5 1/1 Running 0 17m
kube-system dashboard-metrics-scraper-6c4568dc68-gj74f 1/1 Running 0 17m
kube-system kubernetes-dashboard-7ffd448895-925vr 1/1 Running 0 17m
metallb-system controller-559b68bfd8-lgkdv 1/1 Running 0 17m
metallb-system speaker-p878g 1/1 Running 0 17m
ingress nginx-ingress-microk8s-controller-jxmxz 1/1 Running 0 17m
controller-uk8s controller-0 2/2 Running 2 16m
controller-uk8s modeloperator-65c978c8b4-lpzhc 1/1 Running 0 15m
kubeflow modeloperator-68f4bcd86f-s7nz8 1/1 Running 0 15m
kubeflow argo-controller-operator-0 1/1 Running 0 15m
kubeflow argo-ui-operator-0 1/1 Running 0 14m
kubeflow dex-auth-operator-0 1/1 Running 0 14m
kubeflow jupyter-controller-operator-0 1/1 Running 0 14m
kubeflow jupyter-web-operator-0 1/1 Running 0 13m
kubeflow argo-ui-7dbc7569d5-rph55 1/1 Running 0 14m
kubeflow istio-ingressgateway-operator-0 1/1 Running 0 13m
kubeflow istio-pilot-operator-0 1/1 Running 0 13m
kubeflow jupyter-controller-66fd84d549-pvb8k 1/1 Running 0 13m
kubeflow istio-pilot-54db58d6ff-cs4m5 1/1 Running 0 13m
kubeflow katib-controller-operator-0 1/1 Running 0 12m
kubeflow kubeflow-profiles-operator-0 1/1 Running 0 12m
kubeflow pipelines-db-operator-0 1/1 Running 0 12m
kubeflow pipelines-visualization-operator-0 1/1 Running 0 12m
kubeflow tf-job-operator-operator-0 1/1 Running 0 12m
kubeflow pytorch-operator-operator-0 1/1 Running 0 12m
kubeflow katib-manager-operator-0 1/1 Running 0 12m
kubeflow katib-controller-758c6b7b55-nwjbg 1/1 Running 0 12m
kubeflow katib-ui-operator-0 1/1 Running 0 11m
kubeflow kubeflow-dashboard-operator-0 1/1 Running 0 11m
kubeflow metadata-api-operator-0 1/1 Running 0 11m
kubeflow pipelines-scheduledworkflow-operator-0 1/1 Running 0 11m
kubeflow pipelines-ui-operator-0 1/1 Running 0 11m
kubeflow pipelines-viewer-operator-0 1/1 Running 0 11m
kubeflow seldon-core-operator-0 1/1 Running 0 11m
kubeflow metadata-db-operator-0 1/1 Running 0 11m
kubeflow metadata-envoy-operator-0 1/1 Running 0 11m
kubeflow metadata-ui-operator-0 1/1 Running 0 10m
kubeflow minio-operator-0 1/1 Running 0 10m
kubeflow oidc-gatekeeper-operator-0 1/1 Running 0 10m
kubeflow katib-db-operator-0 1/1 Running 0 10m
kubeflow metacontroller-operator-0 1/1 Running 0 10m
kubeflow metadata-grpc-operator-0 1/1 Running 0 10m
kubeflow pipelines-api-operator-0 1/1 Running 0 10m
kubeflow pipelines-persistence-operator-0 1/1 Running 0 10m
kubeflow pipelines-visualization-9dfbbf684-mwm65 1/1 Running 0 12m
kubeflow kubeflow-profiles-559799b56b-8hln6 2/2 Running 1 12m
kubeflow tf-job-operator-6789f578b5-wdzhn 1/1 Running 0 12m
kubeflow pipelines-db-0 1/1 Running 0 12m
kubeflow pytorch-operator-d5d55685b-76rgl 1/1 Running 0 12m
kubeflow katib-ui-7fd6f78898-ngs68 1/1 Running 0 11m
kubeflow pipelines-scheduledworkflow-7c7bb5c5fb-wbrhb 1/1 Running 0 11m
kubeflow pipelines-viewer-9688dfbb9-5twnq 1/1 Running 0 11m
kubeflow seldon-core-7799f4dcc4-x8q65 1/1 Running 0 10m
kubeflow kubeflow-dashboard-58f586fbb4-c6ckn 1/1 Running 0 10m
kubeflow jupyter-web-85675688cd-4z62p 2/2 Running 0 12m
kubeflow metadata-db-0 1/1 Running 0 10m
kubeflow katib-db-0 1/1 Running 0 8m55s
kubeflow metadata-api-788886b5cd-ml8mq 1/1 Running 0 9m21s
kubeflow minio-0 1/1 Running 0 10m
kubeflow metadata-grpc-85776d69d4-4qcw5 1/1 Running 0 9m5s
kubeflow metadata-ui-5658db6c4f-hlvpj 1/1 Running 0 8m42s
kubeflow metacontroller-7676b7895f-7whvz 1/1 Running 0 9m
kubeflow argo-controller-587658cd67-nqwk7 1/1 Running 0 9m4s
kubeflow pipelines-persistence-7fc85bb56-st5zz 1/1 Running 0 8m51s
kubeflow metadata-envoy-758b684754-fkm4r 1/1 Running 0 8m36s
kubeflow pipelines-ui-6dd6c5cf59-8rmlr 2/2 Running 0 8m23s
kubeflow pipelines-api-7cc457dbcc-rzmpc 1/1 Running 0 8m1s
kubeflow istio-ingressgateway-59c958ddf6-drz6z 1/1 Running 0 7m44s
kubeflow katib-manager-65dfb98fcb-8thwf 1/1 Running 0 7m35s
kubeflow dex-auth-8687b86488-wzm8t 2/2 Running 2 6m8s
kubeflow oidc-gatekeeper-7566d4f667-j67nv 2/2 Running 0 6m27s
@didier-durand :I installed and enabled kubeflow on a cloud virtual machine using sudo snap install microk8s --classic --channel=latest/edge microk8s.enable dns dashboard storage microk8s.enable kubeflow
I get a success message saying kubeflow dashboard available at http://localhost
I setup SOCKS proxy on port 9999 and able to open Kubeflow page using clusterIP in the services, but unable to access pipelines and notebook server page
Any idea how we can get notebook server and pipelines page working ?
@danudeep90 : I do not use SOCKS but regular port forwarding via kubectl, which works. Have a look at https://github.com/didier-durand/microk8s-akri to see how I use it (toward end of .sh)
Hello,
Trying to enable kubeflow on microk8s (1.19 classic) on GCE (large instance n2-standard-8: 8 cores - 32 GB) : it hangs up forever on this message
Waited 615s for operator pods to come up, 18 remaining. Waited 630s for operator pods to come up, 18 remaining. Waited 645s for operator pods to come up, 18 remaining. Waited 660s for operator pods to come up, 18 remaining.
After 24m running, I get the following below (I have read #1071. . rbac is disabled so it does not seem to be the cause)
How can I get it to work? Let me know if additional info is required.
Thanks
Didier
microk8s kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE argo-controller-6fc8f85d44-ljxv4 0/1 Evicted 0 25m oidc-gatekeeper-778cc55547-qb9sc 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-d6sjf 0/1 Evicted 0 25m oidc-gatekeeper-778cc55547-gb74j 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-g67jz 0/1 Evicted 0 25m oidc-gatekeeper-778cc55547-h8fxv 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-cvf6p 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-cwwss 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-7967z 0/1 Evicted 0 25m pipelines-api-7d67b7f44f-rdb2n 0/1 Evicted 0 25m ambassador-85b668dcc4-68m92 0/1 Evicted 0 28m metadata-envoy-5cd4f47775-f66s9 0/1 Evicted 0 26m argo-controller-6fc8f85d44-kg8jn 0/1 Evicted 0 25m dex-auth-86d9765856-2mt27 0/1 Init:0/1 0 25m minio-operator-0 0/1 Unknown 0 26m pytorch-operator-operator-0 0/1 Unknown 0 24m metadata-grpc-operator-0 0/1 Unknown 0 27m argo-controller-6fc8f85d44-85c6g 0/1 Init:0/1 0 17m katib-db-manager-operator-0 0/1 ContainerCreating 0 27m dex-auth-6c7bd6d48d-54278 0/1 Evicted 0 24m kubeflow-profiles-operator-0 0/1 Pending 0 8m10s dex-auth-6c7bd6d48d-rszjz 0/1 Pending 0 8m7s dex-auth-operator-0 0/1 Unknown 1 24m katib-db-operator-0 0/1 Unknown 0 25m metadata-db-operator-0 0/1 Unknown 0 24m jupyter-web-operator-0 0/1 Unknown 0 24m metacontroller-operator-0 0/1 Unknown 0 27m oidc-gatekeeper-operator-0 0/1 Unknown 0 26m ambassador-operator-0 0/1 Unknown 1 29m jupyter-controller-operator-0 0/1 Unknown 0 27m ambassador-85b668dcc4-q5dlm 0/1 Init:Unknown 0 24m metadata-envoy-operator-0 0/1 Unknown 0 27m pipelines-viewer-5df646f87d-sntgq 0/1 Init:Unknown 0 17m metacontroller-8f65dd64-jgwdc 0/1 Init:Unknown 0 27m pipelines-scheduledworkflow-595cff68b7-sf25x 0/1 Init:Unknown 0 26m pipelines-db-0 0/1 Init:Unknown 0 24m kubeflow-dashboard-operator-0 0/1 Unknown 0 27m oidc-gatekeeper-778cc55547-7j6ds 0/1 Init:Unknown 0 25m metadata-api-operator-0 0/1 Unknown 1 24m jupyter-controller-d4c6989fd-fqrls 0/1 Init:Unknown 1 27m tf-job-operator-operator-0 0/1 Unknown 0 24m pipelines-scheduledworkflow-operator-0 0/1 Unknown 0 27m argo-ui-operator-0 1/1 Running 1 28m katib-ui-operator-0 0/1 Unknown 0 24m argo-controller-operator-0 1/1 Running 1 28m pipelines-ui-646f785cf6-gkbsg 0/1 Init:Unknown 0 25m seldon-core-operator-0 0/1 Unknown 0 24m pipelines-viewer-operator-0 0/1 Unknown 0 24m pipelines-ui-operator-0 0/1 Pending 0 31s pipelines-visualization-operator-0 0/1 Unknown 1 25m minio-0 0/1 Init:Unknown 0 25m metadata-ui-operator-0 1/1 Running 1 24m argo-ui-868cc7c496-6q5f8 0/1 Init:0/1 2 27m katib-controller-operator-0 1/1 Running 1 24m pipelines-api-7d67b7f44f-kwbqk 0/1 Init:0/1 0 25m pipelines-api-operator-0 0/1 Unknown 0 27m pipelines-db-operator-0 0/1 Unknown 0 27m metadata-envoy-5cd4f47775-pz5gj 0/1 Init:0/1 0 24m pipelines-persistence-operator-0 0/1 CreateContainerError 1 24m