canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

cli.py not working for aws/ckkf #197

Closed paravatha closed 4 years ago

paravatha commented 4 years ago

I am trying to setup ckkf as per https://github.com/juju-solutions/bundle-kubeflow#setup-charmed-kubernetes

sudo snap install juju --classic
sudo snap install juju-wait --classic
sudo snap install juju-helpers --edge --classic
sudo git clone https://github.com/juju-solutions/bundle-kubeflow.git
cd bundle-kubeflow
sudo snap install microk8s --classic --channel=stable
sudo microk8s.status --wait-ready
sudo microk8s.kubectl cluster-info
cd bundle-kubeflow
sudo juju add-credential aws
sudo python3 scripts/cli.py ck setup --controller ckkf
sudo juju add-k8s ckkf -c ckkf --cloud aws --region us-east-1 --storage juju-operator-storage
sudo python3 scripts/cli.py deploy-to ckkf --cloud aws --public-address  {public-ip} 

Error

Enter a password to set for the Kubeflow dashboard: 
Repeat for confirmation: 
+ juju add-model kubeflow aws --config update-status-hook-interval=30s
Added 'kubeflow' model on aws/us-east-1 with credential 'my-aws-creds' for user 'admin'
+ juju deploy -m kubeflow kubeflow --channel stable --overlay=/tmp/tmporof3iu_
Located bundle "cs:bundle/kubeflow-185"
Resolving charm: cs:~kubeflow-charmers/ambassador-78
Resolving charm: cs:~kubeflow-charmers/argo-controller-162
Resolving charm: cs:~kubeflow-charmers/argo-ui-78
Resolving charm: cs:~kubeflow-charmers/cert-manager-controller-9
Resolving charm: cs:~kubeflow-charmers/cert-manager-webhook-9
Resolving charm: cs:~kubeflow-charmers/dex-auth-20
Resolving charm: cs:~kubeflow-charmers/jupyter-controller-176
Resolving charm: cs:~kubeflow-charmers/jupyter-web-81
Resolving charm: cs:~kubeflow-charmers/katib-controller-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/katib-manager-75
Resolving charm: cs:~kubeflow-charmers/katib-ui-71
Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-36
Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-41
Resolving charm: cs:~kubeflow-charmers/metacontroller-68
Resolving charm: cs:~kubeflow-charmers/metadata-api-32
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/metadata-envoy-14
Resolving charm: cs:~kubeflow-charmers/metadata-grpc-13
Resolving charm: cs:~kubeflow-charmers/metadata-ui-34
Resolving charm: cs:~kubeflow-charmers/minio-78
Resolving charm: cs:~kubeflow-charmers/modeldb-backend-76
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/modeldb-store-69
Resolving charm: cs:~kubeflow-charmers/modeldb-ui-68
Resolving charm: cs:~kubeflow-charmers/oidc-gatekeeper-19
Resolving charm: cs:~kubeflow-charmers/pipelines-api-82
Resolving charm: cs:~charmed-osm/mariadb-k8s
Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-167
Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-163
Resolving charm: cs:~kubeflow-charmers/pipelines-ui-78
Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-102
Resolving charm: cs:~kubeflow-charmers/pipelines-visualization-13
Resolving charm: cs:~kubeflow-charmers/pytorch-operator-163
Resolving charm: cs:~kubeflow-charmers/seldon-core-15
Resolving charm: cs:~kubeflow-charmers/tf-job-operator-159
Executing changes:
- upload charm cs:~kubeflow-charmers/ambassador-78 for series kubernetes
- deploy application ambassador with 1 unit on kubernetes using cs:~kubeflow-charmers/ambassador-78
ERROR cannot deploy bundle: cannot deploy application "ambassador": cannot add application "ambassador": series "kubernetes" in a non container model not valid
Command '('juju', 'deploy', '-m', 'kubeflow', 'kubeflow', '--channel', 'stable', '--overlay=/tmp/tmporof3iu_')' returned non-zero exit status 1.
knkski commented 4 years ago

It looks like you're deploying to the wrong cloud. aws represents AWS itself, there will be another cloud probably called ckkf that you'll need to deploy Kubeflow to. In other words, you'll need to fix the add-model command to this:

juju add-model kubeflow ckkf --config update-status-hook-interval=30s
paravatha commented 4 years ago

@knkski , I commented out this line

juju('add-model', model, cloud, '--config', 'update-status-hook-interval=30s')

tried this

sudo python3 scripts/cli.py deploy-to ckkf --no-build --public-address {pub-ip}

no luck, so tried this way

sudo python3 scripts/cli.py deploy-to ckkf --no-build --public-address {pub-ip}

now, its complaining about tmp folders

> Building kubeflow-dashboard
> Error: SubcommandError("charm", "No such file or directory (os error 2)")
> Command '('juju', 'bundle', 'deploy', '--build', '--', '-m', 'kubeflow', '--overlay=/tmp', '--overlay=/tmp/tmp4sn_6kij')' returned non-zero exit status 1.

It seems like the loop is not able to create folders under /tmp

sakaia commented 4 years ago

I do following commands. but even waiting 20 minutes. installation still keeps. What should I do? I shoud update ck.yaml config (edit memory and cpu)?

sudo python3 scripts/cli.py ck setup --controller ckkf

repeating message is follows.

DEBUG:root:aws-integrator/0 workload status is blocked since 2020-05-07 13:43:43+00:00
DEBUG:root:kubernetes-master/0 workload status is waiting since 2020-05-07 13:26:42+00:00
DEBUG:root:kubernetes-master/1 workload status is waiting since 2020-05-07 13:26:50+00:00
DEBUG:root:kubernetes-worker/0 workload status is waiting since 2020-05-07 13:18:30+00:00
DEBUG:root:kubernetes-worker/1 workload status is waiting since 2020-05-07 13:18:07+00:00
DEBUG:root:kubernetes-worker/2 workload status is waiting since 2020-05-07 13:16:55+00:00

By the way, I now understands controller uk8s (on localmachine) and ckkf (on aws).

knkski commented 4 years ago

@paravatha: You'll need to run sudo snap install charm --classic to get the charm command. You'll also need that add-model line in there, as that creates a Kubeflow-specific model. You won't be able to deploy Kubeflow to the default AWS model.

knkski commented 4 years ago

@sakaia: Can you post the output from juju debug-log --replay --no-tail --include=kubernetes-master, as well as the output from juju status --relations?

paravatha commented 4 years ago

@knkski When we run this command, how many ec2 instances it tries to provision? also, what kind of authorization the IAM user needs?

sudo python3 scripts/cli.py ck setup --controller ckkf

knkski commented 4 years ago

@paravatha: That by default will create a Kubernetes deployment with 2 masters and 3 workers, 3 etcd machines, and a handful of other machines such as a load balancer, for a total of 12 machines.

I don't have an exact list of IAM permissions needed, but the IAM role that I'm using has full access to EC2 and read access to IAM and STS.