canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.4k stars 767 forks source link

Unable to enable kubeflow using channel 1.16/edge/kubeflow #753

Closed praveen049 closed 4 years ago

praveen049 commented 4 years ago

microk8s_kubeflow.txt Please run microk8s.inspect and attach the generated tarball to this issue.

We appreciate your feedback. Thank you for using microk8s. I am using the 1.16/edge/kubeflow channel and when i try microk8s.enable kubeflow the command hangs at this step

 microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling juju...
Deploying Kubeflow...

Any suggestion on how can i troubleshoot this ?

ktsakalozos commented 4 years ago

Hi @praveen049

Could you attach here the tarball created by microk8s.inspect. Could you also share the logs you get with microk8s.juju debug-log -n 2000. @knkski can you think of anything else it could help us figure out what might be wrong here?

praveen049 commented 4 years ago

@ktsakalozos

Attaching it here. inspection-report-20191024_054807.tar.gz

microk8s.juju debug-log -n 2000 gives ERROR Gateway Timeout.

I am behind a proxy and no_proxy is set correctly.

praveen049 commented 4 years ago

@ktsakalozos

Any suggestions on how i can troubleshoot this issue ?

Thanks

knkski commented 4 years ago

@praveen049: Are you able to run any microk8s.juju commands at all? Can you post microk8s.juju status if it runs successfully?

praveen049 commented 4 years ago

@knkski i have tried couple of commands microk8s.juju status and microk8s.juju users and they both hang

knkski commented 4 years ago

@praveen049: Can you try microk8s.juju status --debug and see if you get any output? Otherwise, can you post the logs from the juju controller pod?

praveen049 commented 4 years ago

@knkski

Output of microk8s.juju status -debug

(base) sims@kubeflow:~$ microk8s.juju status --debug
04:47:50 INFO  juju.cmd supercommand.go:79 running juju [2.7-rc1  gc go1.10.4]
04:47:50 DEBUG juju.cmd supercommand.go:80   args: []string{"/var/snap/microk8s/946/bin/juju", "status", "--debug"}
04:47:50 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.246:17070]

The output of kubectl describe pods -n controller-uk8s is attached juju_pod2.txt

knkski commented 4 years ago

@praveen049: Can you also post the logs from that pod? It looks like it's running normally.

knkski commented 4 years ago

@praveen049: If nothing else, can you try microk8s.disable kubeflow, or microk8s.juju unregister -y uk8s if that doesn't work, then trying microk8s.enable kubeflow again?

praveen049 commented 4 years ago

@knkski i have reinstalled microk8s from 1.16/edge/kubeflowchannel. Previously it was installed with 1.14/stable and then switched channel to 1.16/edge/kubeflow

Attached are the logs from the mongodb and api-server pods apiserver.txt mongodb.txt

This time the microk8s.enable kubeflow returns with the below error

(base) sims@trainer:~$ microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling juju...
Deploying Kubeflow...
Creating Juju controller "uk8s" on microk8s/localhost
Creating k8s resources for controller "controller-uk8s"
Downloading images
Starting controller pod
Bootstrap agent now started
Contacting Juju controller at 10.152.183.89 to verify accessibility...
ERROR unable to contact api server after 1 attempts: Gateway Timeout

Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow 

And trying the disable and unregister does not help

(base) sims@kubeflow:~$ microk8s.disable kubeflow
ERROR controller uk8s not found

Command '('microk8s-juju.wrapper', 'destroy-controller', '-y', 'uk8s', '--destroy-all-models', '--destroy-storage')' returned non-zero exit status 1
Failed to disable kubeflow
(base) sims@kubeflow:~$ microk8s.juju unregister -y uk8s
ERROR controller uk8s not found
knkski commented 4 years ago

@praveen049: Could you try updating the snap (sudo snap refresh microk8s), and running KUBEFLOW_DEBUG=true microk8s.enable kubeflow? That will add in the --debug flag to Juju, which should help diagnose what's going on here.

praveen049 commented 4 years ago

@knkski Attached is the log with the debug option. juju_status_error_debug.txt

praveen049 commented 4 years ago

@knkski Any pointers on how to troubleshoot and fix this issue ?

Thanks

knkski commented 4 years ago

@praveen049: Sorry about the wait. It looks like you've got a proxy issue. Can you either try it without the proxy involved, or post the output from this command?

microk8s.juju --debug bootstrap microk8s --config juju-no-proxy=10.0.0.1
praveen049 commented 4 years ago

@knkski thank you for the feedback.

Attached is the output juju-noproxy.txt

the commands i used: KUBEFLOW_DEBUG=true microk8s.enable kubeflow This fails as before and then

microk8s.juju --debug bootstrap microk8s --config juju-no-proxy=10.0.0.1

Thanks

knkski commented 4 years ago

@praveen049: It looks like the manual bootstrap command worked for you, so I've added in the flag that should fix things for you in PR #785.

praveen049 commented 4 years ago

@knkski thank you for the fix.

so, i need to deploy again from the channel and enable kubeflow with the below commands ?

sudo snap install microk8s --classic --channel 1.16/edge/kubeflow
microk8s.enable kubeflow
praveen049 commented 4 years ago

@knkski Based on the discussion on the thread for PR 785, it seems the fix was not merged. Is there any other solution or workaround to get it working ?

Thanks

charlesa101 commented 4 years ago

installing from 1.16/edge/kubeflow channel worked for me 2 days ago

sudo snap install microk8s --classic --channel 1.16/edge/kubeflow
microk8s.enable kubeflow

but now i am getting this error right now

KUBEFLOW_DEBUG=true sudo  microk8s.enable  kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling rbac...
Enabling juju...
Deploying Kubeflow...
Located bundle "cs:bundle/kubeflow-134"
ERROR cannot deploy bundle: the provided bundle has the following errors:
empty charm path
invalid charm URL in application "ambassador-auth": cannot parse URL "": name "" not valid

Command '('microk8s-juju.wrapper', 'deploy', 'kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmpnmhsn4l0')' returned non-zero exit status 1
Failed to enable kubeflow
knkski commented 4 years ago

@charlesa101: Apologies, can you run sudo snap switch microk8s --channel edge && sudo snap refresh? I think that particular channel is no longer getting updated and will disappear due to the feature getting merged into master.

praveen049 commented 4 years ago

@knkski I have now tried with the new channel and the proxy issue seems to be resolved. Thank for that.

I have running into a different error :

`Resolving charm: cs:~kubeflow-charmers/seldon-cluster-manager-47 Resolving charm: cs:~kubeflow-charmers/tensorboard-46 Resolving charm: cs:~kubeflow-charmers/tf-job-dashboard-48 Resolving charm: cs:~kubeflow-charmers/tf-job-operator-46 ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-47": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-47": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-47/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving

Command '('microk8s-juju.wrapper', 'deploy', 'kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmpu9t_xu6d')' returned non-zero exit status 1 Failed to enable kubeflow`

Attached is the full logs microk8s-19Nov.txt

Any pointers on how to resolve this ?

Thanks

praveen049 commented 4 years ago

@charlesa101 are you able to deploy Kubeflow from the edge channel ?

praveen049 commented 4 years ago

@knkski

Any suggestions on how to troubleshooting this issue ?

charlesa101 commented 4 years ago

@praveen049 yea i was able to get this running from the edge channel

but before then i had clean up my snap directory

sudo snap switch microk8s --channel edge && sudo snap refresh like @knkski said

then microk8s enable storage dns rbac juju kubeflow did it for me

praveen049 commented 4 years ago

@charlesa101 thanks for the info

But these commands are not working for me and it's failing when deploy kubeflow with below error

ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-47": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-47": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-47/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving

Command '('microk8s-juju.wrapper', 'deploy', 'kubeflow', '--channel', 'stable', '--overlay', '/tmp/tmpu9t_xu6d')' returned non-zero exit status 1
Failed to enable kubeflow

I am running behind a proxy and seems to be some issue related to that.

knkski commented 4 years ago

@praveen049: Yeah, that could definitely be a proxy issue. @ktsakalozos, do you know how we should handle that?

knkski commented 4 years ago

@praveen049: Can you post the output from KUBEFLOW_DEBUG=true microk8s.enable kubeflow? That should output some more useful information

praveen049 commented 4 years ago

@knkski

Attaching the debug output microk8s-debug-27Nov.txt

ktsakalozos commented 4 years ago

@wallyworld, how would we put the pods subnet in no-proxy? Looking at https://discourse.jujucharms.com/t/configuring-models/1151 no-proxy does not take a CIDR notation so it can not fit a /16 network. What about juju-no-proxy? Could we use that one?

praveen049 commented 4 years ago

@ktsakalozos @wallyworld @knkski

Hi, Any suggestions on how to address this proxy issue ?

Thanks

wallyworld commented 4 years ago

I think juju-no-proxy may work, but in practice it can be hit and miss depending on the environment in which stuff is running.

msnidal commented 4 years ago

Just wanted to add in case anybody got here from Google that in my case, the problem was that I had a folder kubeflow in my home directory from a previous installation (now on 1.17/stable) and the juju command was therefore ambiguous between cs:kubeflow and my local folder. I found this by setting KUBEFLOW_DEBUG=true and saw this message:

/build/juju/parts/juju/go/src/github.com/juju/juju/cmd/juju/application/deploy.go:1340: The charm or bundle "kubeflow" is ambiguous.

Therefore, I just change to a different directory to run and that fixed it, Kubeflow then deployed perfectly.