Open Cvija2609 opened 2 years ago
Hi,
A am not sure that it can help, but you can try to run example from my working repo. https://github.com/webmakaka/Machine-Learning-on-Kubernetes
Working stand should looks like https://github.com/PacktPublishing/Machine-Learning-on-Kubernetes/issues/6#issuecomment-1221355813
thank You @webmakaka - tried it, still same problem :(
How long it took for You to initialize airflow? I'll leave it be and see if that may be the problem
less than 17m
sent you email on mar***@gm.com with my step by step instruction how to run environment for this book.
I am having problems with the same step. Everything until that step is working fine.
I am using --driver=docker
and minikube
version v1.28.0 on WSL2 (Ubuntu).
$ kubectl create -f manifests/kfdef/ml-platform.yaml -n ml-workshop
kfdef.kfdef.apps.kubeflow.org/opendatahub-ml-workshop created
This works fine.
But then none of the pods are being created (see below). I went through these steps multiple times (started all over again), but to no avail:
$ kubectl get pods -n ml-workshop
No resources found in ml-workshop namespace.
$ kubectl get all -n ml-workshop
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jupyterhub ClusterIP 10.97.249.33 <none> 8080/TCP,8081/TCP 40m
service/jupyterhub-db ClusterIP 10.107.85.252 <none> 5432/TCP 40m
@vishal-git Try to repeat from scratch. You can use my repo and instructions inside, if needed.
I've tried everything multiple times and now I'm constantly getting error as @vishal-git
I've dig deeper and found out that opendatahub-operator
is throwing this
...
configmap/jupyterhub-default-groups-config serverside-applied
configmap/spark-cluster-template serverside-applied
configmap/parameters serverside-applied
configmap/odh-jupyterhub-sizes serverside-applied
configmap/jupyter-singleuser-profiles serverside-applied
configmap/jupyterhub-cfg serverside-applied
persistentvolumeclaim/jupyterhub-db serverside-applied
serviceaccount/jupyterhub-hub serverside-applied
clusterrole.rbac.authorization.k8s.io/jupyterhub-cluster serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/jupyterhub-cluster serverside-applied
role.rbac.authorization.k8s.io/jupyterhub serverside-applied
service/jupyterhub-db serverside-applied
service/jupyterhub serverside-applied
ingress.networking.k8s.io/jupyterhub serverside-applied
route.route.openshift.io/jupyterhub serverside-applied
time="2023-01-02T13:00:29Z" level=warning msg="Encountered error applying application jupyterhub: (kubeflow.error): Code 500 with message: Apply.Run : [failed to create typed patch object: .metadata.label: field not declared in schema, failed to create typed patch object: .roleRef.namespace: field not declared in schema, failed to create typed patch object: errors:\n .spec.selector.deploymentconfig: field not declared in schema\n .spec.strategy.recreateParams: field not declared in schema\n .spec.triggers: field not declared in schema, failed to create typed patch object: errors:\n .spec.selector.deploymentconfig: field not declared in schema\n .spec.strategy: field not declared in schema]"
time="2023-01-02T13:00:29Z" level=warning msg="Will retry in 6 seconds."
i have tried multi time and i have the same issue as @vishal-git minikube version: v1.25.2
k logs -f opendatahub-operator-869cdfdf6f-drvf2 -n operators output logs shows the error
time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." clusterrolebinding.rbac.authorization.k8s.io/jupyterhub-cluster serverside-applied time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-hub.ml-workshop." time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-hub.ml-workshop." role.rbac.authorization.k8s.io/jupyterhub serverside-applied time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." service/jupyterhub-db serverside-applied time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." time="2023-01-04T18:47:29Z" level=info msg="Watch a change for Kubeflow resource: jupyterhub-db.ml-workshop." service/jupyterhub serverside-applied ingress.networking.k8s.io/jupyterhub serverside-applied route.route.openshift.io/jupyterhub serverside-applied time="2023-01-04T18:47:30Z" level=warning msg="Encountered error applying application jupyterhub: (kubeflow.error): Code 500 with message: Apply.Run : [failed to create typed patch object: .metadata.label: field not declared in schema, failed to create typed patch object: .roleRef.namespace: field not declared in schema, failed to create typed patch object: errors:\n .spec.selector.app: field not declared in schema\n .spec.strategy.recreateParams: field not declared in schema\n .spec.triggers: field not declared in schema, failed to create typed patch object: errors:\n .spec.selector.app: field not declared in schema\n .spec.strategy: field not declared in schema]" time="2023-01-04T18:47:30Z" level=warning msg="Will retry in 4 seconds." configmap/jupyterhub-default-groups-config serverside-applied configmap/spark-cluster-template serverside-applied configmap/parameters serverside-applied configmap/odh-jupyterhub-sizes serverside-applied configmap/jupyter-singleuser-profiles serverside-applied configmap/jupyterhub-cfg serverside-applied
any ideas ..!
I think you should use recommended kubernetes version in minikube.
i already did , i start from scratch and still the same issue minikube version: v1.24.0 and kubernetes version 1.22.4 same as mentioned in Book
Can you try to run examples from my repo with instructions?
https://github.com/webmakaka/Machine-Learning-on-Kubernetes/tree/master/docs/01-environment
And then
https://github.com/webmakaka/Machine-Learning-on-Kubernetes/blob/master/docs/05-data-engineering.md
(Use google translate if needed translate from russian)
If something not work, i'll check it on my environment next week.
@webmakaka could You please try again running this whole setup? If You have resources available of course.
I've tried multiple times from scratch. I've even ran an EC2 instance on AWS - t3.2xlarge and tried with it, but with no success.
Minikube version and kubernetes version is same as in the book.
I've checked the logs in operator namespace again and opendatahub-operator
throws same errors as before.
To sum up, tried multiple times getting same result. Something is not working as intended and I don't know what.
Airflow is not the problem anymore, I can't get to that point to check.
$ minikube profile list
|----------|-----------|---------|--------------|------|---------|---------|-------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes |
|----------|-----------|---------|--------------|------|---------|---------|-------|
| minikube | podman | docker | 192.168.49.2 | 8443 | v1.22.4 | Running | 1 |
|----------|-----------|---------|--------------|------|---------|---------|-------|
$ minikube config view
- cpus: 8
- disk-size: 60GB
- memory: 30GB
$ minikube version
minikube version: v1.24.0
commit: 76b94fb3c4e8ac5062daf70d60cf03ddcc0a741b
I checked. Same error as yours. If i find solution, i write how to fix.
I updated configs in my repo.
Current situation is:
I am having problems with the same step. Everything until that step is working fine.
I am using
--driver=docker
andminikube
version v1.28.0 on WSL2 (Ubuntu).$ kubectl create -f manifests/kfdef/ml-platform.yaml -n ml-workshop kfdef.kfdef.apps.kubeflow.org/opendatahub-ml-workshop created
This works fine.
But then none of the pods are being created (see below). I went through these steps multiple times (started all over again), but to no avail:
$ kubectl get pods -n ml-workshop No resources found in ml-workshop namespace. $ kubectl get all -n ml-workshop NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/jupyterhub ClusterIP 10.97.249.33 <none> 8080/TCP,8081/TCP 40m service/jupyterhub-db ClusterIP 10.107.85.252 <none> 5432/TCP 40m
I now have the same issue reported by vishal-git above... Almost at a point to ditch this book and move on to some mature, proven content from OReilly -- edition 2 of the wildly praised book by Dr. Lakshmanan.
Almost total waste of crucial time on this untested book. Sorry folks, better luck next time!
Everything worked a year ago.
I updated my configs and now all pods runs.
There was problems with new pods version from author registry.
When i returned to original, platform starts running without errors (at least it actual for page 105).
Platform:
minikube version: v1.24.0
tl;dr airflow won't start, logs of everything are listed below
I'm trying to recreate everything and I'm stuck with this part. I've been waiting for some time for everything written in
ml-platform.yaml
to configure andapp-aflow-airflow-web
is inCrashLoopBackoff
state for 1 hour now.I've tried killing it, recreating it and nothing has worked.
Here is list of pods created during execution of this command:
I've changed to
minikube ip
as mentioned.Logs from failing container
app-aflow-airflow-web-7c566d79d-4v2wv:airflow-web
:describing pods also does not reveal much for me:
replicaset:
kubectl logs:
previous logs:
Service for postgresql exists and
waitfordatabase
executed successfully.When I deleted this with:
and reapplied it with same command as mentioned above,
airflow2-proxy
secret was missing. Added that frommanifests/airflow2/base/service-accounts.yaml
and same error appeared.