Open PaulSteffen-betclic opened 3 years ago
Thank you for reaching out @PaulSteffen-betclic
There are few possibilities here. Can you please check if other pods are also pending
?
You can check this by running kubectl get pods --all-namespaces
. If there are few pods in pending
, it might be that you don't have enough resources (probably you're deploying to a region with more than three availability zones e.g. us-east-1).
If this is the case, you can ...
terraform destroy -var-file=my_vars.tfvars
OpenMLOps-EKS-cluster
, checkout the branch pipatth/fix-autoscaler
and use it instead of the master
. This branch will deploy the cluster to only two availability zones and you shouldn't have a problem with computing resources.Let me know if this works out.
Thank you for your answer @pipatth.
No pods are also pending
.
NAMESPACE NAME READY STATUS RESTARTS AGE ambassador ambassador-55dd56c9d8-n97ff 1/1 Running 0 8m4s dask dask-jupyter-85f4d96c8d-jddz8 1/1 Running 0 8m29s dask dask-scheduler-6b59ddb864-v9drh 1/1 Running 0 8m28s dask dask-worker-cb785d955-fw2p7 1/1 Running 0 8m28s dask dask-worker-cb785d955-j99n7 1/1 Running 0 8m29s dask dask-worker-cb785d955-mvxgd 1/1 Running 0 8m28s feast feast-feast-core-5ff9695c44-z5z8l 1/1 Running 0 8m15s feast feast-feast-jobservice-75d6fd6c58-h4hz6 1/1 Running 0 8m15s feast feast-feast-online-serving-5847c98d77-4hnz4 1/1 Running 0 8m16s feast feast-postgresql-0 1/1 Running 0 8m15s feast feast-redis-master-0 1/1 Running 0 8m15s feast feast-redis-slave-0 1/1 Running 0 8m15s feast feast-redis-slave-1 1/1 Running 0 7m4s feast feast-spark-spark-operator-85fc4fb995-brxcs 1/1 Running 0 8m19s jhub continuous-image-puller-brf7w 1/1 Running 0 2m8s jhub continuous-image-puller-pwmn4 1/1 Running 0 2m8s jhub continuous-image-puller-ztz92 1/1 Running 0 2m8s jhub hub-57fc96d677-95777 1/1 Running 0 2m8s jhub proxy-69896f594b-fwg7f 1/1 Running 0 2m8s jhub user-scheduler-5464f84c96-7ts2b 1/1 Running 0 2m8s jhub user-scheduler-5464f84c96-m6hsv 1/1 Running 0 2m8s kube-system autoscaler-aws-cluster-autoscaler-6fdb5cf446-mzq2f 1/1 Running 0 11m kube-system aws-node-6668f 1/1 Running 0 10m kube-system aws-node-72vgl 1/1 Running 0 10m kube-system aws-node-mq2cw 1/1 Running 0 10m kube-system coredns-6ddcfb5bcf-6ft45 1/1 Running 0 14m kube-system coredns-6ddcfb5bcf-mtpfg 1/1 Running 0 14m kube-system kube-proxy-5xwhk 1/1 Running 0 10m kube-system kube-proxy-dbcsl 1/1 Running 0 10m kube-system kube-proxy-xvzcx 1/1 Running 0 10m kube-system metrics-server-c65bf9997-wwsl2 1/1 Running 0 8m49s kube-system seldon-spartakus-volunteer-8488bc5849-ht8cl 1/1 Running 0 8m21s mlflow mlflow-595f7556c9-cw9x8 1/1 Running 0 8m12s mlflow postgres-postgresql-0 1/1 Running 0 8m57s ory ory-kratos-5f777789c7-mwfs8 0/1 Running 0 6m47s ory ory-kratos-courier-0 1/1 Running 0 6m47s ory ory-kratos-ui-5857cc6d9b-kwdmt 1/1 Running 0 100s ory ory-oathkeeper-6bf994cf97-b6n2b 1/1 Running 0 8m30s ory postgres-postgresql-0 1/1 Running 0 8m19s prefect prefect-server-agent-784d877787-wgbzz 1/1 Running 1 8m16s prefect prefect-server-apollo-57fc96dfcd-22hnx 1/1 Running 0 8m16s prefect prefect-server-create-tenant-job-6cwnl 0/1 Completed 2 8m15s prefect prefect-server-graphql-96fb476c4-b7k4d 1/1 Running 0 8m16s prefect prefect-server-hasura-5d45596fd-pdwl7 1/1 Running 3 8m16s prefect prefect-server-postgresql-0 1/1 Running 0 8m15s prefect prefect-server-towel-686fd94f7f-2hppd 1/1 Running 0 8m16s prefect prefect-server-ui-575b447fbc-dvz5w 1/1 Running 0 8m16s seldon seldon-controller-manager-6b6c65f4c4-b2dr4 1/1 Running 0 8m21s
I change the region used to select eu-west-2, as in your tutorial, I destroy the cluster and I checkout the branch pipatth/fix-autoscaler
before redoing the tutorial.
But the issue persist ...
@PaulSteffen-betclic Can you please send me the log from that pod?
i.e. kubectl logs -n ambassador ambassador-55dd56c9d8-n97ff
When I try to follow the instructions at https://github.com/datarevenue-berlin/OpenMLOps/blob/master/tutorials/set-up-open-source-production-mlops-architecture-aws.md
I got to the step running:
kubectl get svc -n ambassador
But I got the following return:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ambassador LoadBalancer 172.20.12.231 pending 443:30209/TCP 55m ambassador-admin ClusterIP 172.20.48.162 none 8877/TCP,8005/TCP 55m