Closed ycheng closed 4 years ago
Hmm, not sure what might be causing this issue. I don't think we currently have a way of passing the --debug
flag down to the Juju commands from microk8s.enable kubeflow
, which would help debug this (though @ktsakalozos may correct me there), so I'll need to work on getting that in there. In the meantime, can you try running juju --debug deploy kubeflow
, to see if that outputs anything useful?
re-test again today, now microk8s.enable kubeflow can finish running.
However, as I try to create notebook, it failed to create.
BTW, kubeflow 0.6.2 is release, and currently 1.15/edge/kubeflow still use kubeflow v0.5. Steps in https://ubuntu.com/kubeflow/install can properly install kubeflow 0.6.
@ycheng: Can you list the steps you took to create the notebook, and how it failed? Additionally, can you attach output from these commands?
microk8s.kubectl logs --tail 1000 --all-containers -l juju-app=jupyter-controller
microk8s.kubectl logs --tail 1000 --all-containers -l juju-app=jupyter-web
@knkski:
installation: snap core: r7396 microk8s: v1.15.3, r802, channel: 1.15/edge/kubeflow
Steps microk8s.reset microk8s.enable kubeflow microk8s.kubectl get po -n kubeflow => make sure all pod are Running. kubectl get svc -n kubeflow | grep ambassador => get the ip of ambassador, open browser http://ip/ to go the main ui. Choose Notebooks from the left side menu Click New Server Fill in the server name, nothing else, click "Spawin" in the buttom of the page. "No Status Available" for the new created server
Both command ("microk8s.kubectl logs ....") output nothing.
/var/log/pods/kubeflow_jupyter-controller-operator-0_d4811256-5f33-4613-9068-4792c179c3ae/juju-operator/ and get 0.log as jupyter-controller.log /var/log/pods/kubeflow_jupyter-web-7979d96ff9-2z58r_c39c2161-4c9f-4aac-b933-4d560bbfc978/jupyterhub/ and get 0.log as jupyter-web.log
jupyter-web.log jupyter-controller.log 2019-09-20 21-01-52_screenshot
@ycheng: can you try switching microk8s to the 1.16/edge/kubeflow
channel and trying again?
@knkski, just try today. microk8s is r946. it need user name and password to login. do you know what's the default one?
@ycheng: you can find the username / password to log into the kubeflow dashboard with these two commands:
juju config ambassador-auth username
juju config ambassador-auth password
@ycheng: Is this working for you then? Or is it still hanging when you run microk8s.enable kubeflow
? If it is still an issue for you, can you run switch to the latest version of microk8s edge with sudo snap switch microk8s --channel edge && sudo snap refresh microk8s
, and then post the output from KUBEFLOW_DEBUG=true microk8s.enable kubeflow
?
@knkski I can log in now. While try to create a notebook, it shows an error message
Warning!notebooks.kubeflow.org is forbidden: User "system:serviceaccount:kubeflow:default" cannot list resource "notebooks" in API group "kubeflow.org" in the namespace "kubeflow"
@ycheng: Sorry about that. I've got a fix in the edge bundle, but in the meantime, you could try running microk8s.disable rbac
, which should fix that issue.
@knkski hi, I reinstall microk8s from edge and got microk8s r1056 + core r8038
microk8s.enable kubeflow failed with log attached.
@ycheng we recently (yesterday) pushed a patch [1] to address this. Could you try reinstalling from edge?
microk8s r1071:
03:42:21 INFO juju.util.exec exec.go:209 run result: exit status 1 ERROR The microk8s user group is created during the microk8s snap installation. Users in that group are granted access to microk8s commands and this is needed for Juju to be able to interact with microk8s.
Add yourself to that group before trying again: sudo usermod -a -G microk8s root
03:42:21 DEBUG cmd supercommand.go:519 error stack: /build/juju/parts/juju/go/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:337: The microk8s user group is created during the microk8s snap installation. Users in that group are granted access to microk8s commands and this is needed for Juju to be able to interact with microk8s.
Add yourself to that group before trying again: sudo usermod -a -G microk8s root
/build/juju/parts/juju/go/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:286: /build/juju/parts/juju/go/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:996: /build/juju/parts/juju/go/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:575:
Command '('microk8s-juju.wrapper', '--debug', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1 Failed to enable kubeflow
@wallyworld has already a fix for this issue and it should be available soon.
it seems microk8s r1077 still failed with the same error. Did you have it test pass?
@ycheng the error you see if from the juju client. The microk8s.enable kubeflow
addon uses for now the juju client from the snap edge channel (https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/actions/enable.juju.sh#L13). @wallyworld may know more on when the fix will land there or if we should be using a different channel. Thanks.
@ycheng: Are you still running into this issue?
hi all, I am getting this error:
Revoked:false Label:admin Invalid:false InvalidReason:}]}
18:45:28 INFO juju.util.exec exec.go:209 run result: exit status 1
ERROR microk8s:
running: false
18:45:28 DEBUG cmd supercommand.go:519 error stack:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:384: microk8s:
running: false
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:349:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:286:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:996:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:575:
Command '('microk8s-juju.wrapper', '--debug', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow
anyone fixed this?
@ricpet: This may be a race condition. Can you try running microk8s.disable kubeflow; microk8s.enable kubeflow
to see if you run into the same error?
@knkski thanks for your reply. I tried but nothing, still the same issue.
I actually managed to fix that issue (there was some conflict with an old installation), however right now, I get this:
Kubeflow could not be enabled:
Creating Juju controller "uk8s" on microk8s/localhost
Creating k8s resources for controller "controller-uk8s"
ERROR failed to bootstrap model: creating controller stack for controller: creating statefulset for controller: timed out waiting for controller pod: pending: -
Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow
I actually managed to fix that issue (there was some conflict with an old installation), however right now, I get this:
Kubeflow could not be enabled:
Creating Juju controller "uk8s" on microk8s/localhost
Creating k8s resources for controller "controller-uk8s"
ERROR failed to bootstrap model: creating controller stack for controller: creating statefulset for controller: timed out waiting for controller pod: pending: -
Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow
@ricpet: What was the fix involved in the previous error you were running into?
I haven't seen this new error before. Can you try again with the KUBEFLOW_DEBUG=true
environment variable set? Offhand, it looks like a networking issue, which you might have if you're running behind a proxy/firewall/etc.
hi @knkski the error is actually the same as before:
19:09:21 INFO juju.util.exec exec.go:209 run result: exit status 1
ERROR microk8s:
running: false
19:09:21 DEBUG cmd supercommand.go:519 error stack:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:384: microk8s:
running: false
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:349:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/caas/kubernetes/provider/cloud.go:286:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:996:
/var/lib/jenkins/workspace/BuildJuju-centos-amd64/_build/src/github.com/juju/juju/cmd/juju/commands/bootstrap.go:575:
Command '('microk8s-juju.wrapper', '--debug', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
just to add more context... I am using Ubuntu 18.04 (desktop) and I installed microk8s following this link https://microk8s.io/ and kubeflow using (https://www.kubeflow.org/docs/other-guides/virtual-dev/getting-started-multipass/). When I enable kubeflow I get:
Enabling dns...
[sudo] password for USER:
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
Deploying Kubeflow...
Kubeflow could not be enabled:
ERROR microk8s:
running: false
Command '('microk8s-juju.wrapper', 'bootstrap', 'microk8s', 'uk8s')' returned non-zero exit status 1
Failed to enable kubeflow
@ricpet it could be possible that the machine you are deploying kubeflow is running out of memory and the OS killed the apiserver while kubeflow was coming up. What are the specs of the machine (virtual or not) where MicroK8s runs on? The microk8s.inspect
tarball has information we would need to debug this case. Thanks.
hi @ktsakalozos thanks for your help. The specs of my machine are:
the outupt of microk8s.inspect
is:
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT
The change can be made persistent with: sudo apt-get install iptables-persistent
WARNING: Docker is installed.
File "/etc/docker/daemon.json" does not exist.
You should create it and add the following lines:
{
"insecure-registries" : ["localhost:32000"]
}
and then restart docker with: sudo systemctl restart docker
Building the report tarball
Report tarball is at /var/snap/microk8s/1173/inspection-report-20200226_104633.tar.gz
@ricpet: Can you attach the tarball that microk8s.inspect
generated? It looks like it put it at /var/snap/microk8s/1173/inspection-report-20200226_104633.tar.gz
Same issue here
Ubuntu 18
Revoked:false Label:admin Invalid:false InvalidReason:}]} 13:52:10 INFO juju.util.exec exec.go:209 run result: exit status 1 ERROR microk8s: running: false
@ricpet, @mikejmills: If you switch to the edge version of microk8s, it includes a fix for this error:
# If you don't have it installed
sudo snap install microk8s --classic --edge
# If you have it installed
sudo snap switch microk8s --channel edge
sudo snap refresh microk8s
Note that with the edge version, you'll have to use the edge kubeflow bundle, so you'll need to enable microk8s like this:
KUBEFLOW_CHANNEL=edge microk8s.enable kubeflow
That requirement should disappear when microk8s 1.18 hits stable, which is targeted for this Thursday (March 26th).
install microk8s via
sudo snap install --classic microk8s --channel=1.15/edge/kubeflow
I got rev 802 of microk8s. With command "microk8s.enable kubeflow" it hangs with the following out for more than 10 mins.