Open estechnical opened 7 years ago
I saw another issue about colocation not working correctly - this also happens to our setup, we see more machines provisioned than we expected.
Yea :( We are working to fix that soon.
As for MAAS constraints, here is your bundle representation (what juju uses to deploy applications with)
description: A nine-machine Kubernetes cluster, appropriate for production. Includes
a three-machine etcd cluster and three Kubernetes worker nodes.
machines:
'0':
constraints: tags=q4gg3b
series: xenial
'1':
constraints: tags=46p44n
series: xenial
'2':
constraints: tags=7hmwxn
series: xenial
'3':
constraints: tags=kbf4kd
series: xenial
'4':
constraints: tags=bkxxf7
series: xenial
'5':
constraints: tags=7yxcdk
series: xenial
'6':
constraints: tags=spq4qp
series: xenial
'7':
constraints: tags=p6q3rm
series: xenial
'8':
constraints: tags=ahfmmk
series: xenial
relations:
- - kubernetes-master:certificates
- easyrsa:client
- - etcd:certificates
- easyrsa:client
- - kubernetes-worker:certificates
- easyrsa:client
- - kubeapi-load-balancer:certificates
- easyrsa:client
- - kubernetes-master:etcd
- etcd:db
- - kubernetes-master:kube-api-endpoint
- kubeapi-load-balancer:apiserver
- - kubernetes-master:loadbalancer
- kubeapi-load-balancer:loadbalancer
- - kubernetes-worker:kube-api-endpoint
- kubeapi-load-balancer:website
- - kubernetes-master:kube-control
- kubernetes-worker:kube-control
series: xenial
services:
easyrsa:
charm: cs:~containers/easyrsa-15
num_units: 1
to:
- '0'
etcd:
charm: cs:~containers/etcd-48
num_units: 3
to:
- '1'
- '2'
- '3'
kubeapi-load-balancer:
charm: cs:~containers/kubeapi-load-balancer-25
num_units: 1
to:
- '4'
kubernetes-master:
charm: cs:~containers/kubernetes-master-47
num_units: 1
options:
channel: 1.7/stable
to:
- '5'
kubernetes-worker:
charm: cs:~containers/kubernetes-worker-52
num_units: 3
options:
channel: 1.7/stable
to:
- '6'
- '7'
- '8'
If you take this bundle and save it to a file like bundle.yaml
and try to juju deploy --debug ./bundle.yaml
Does the same error happen with no machines matching constraints?
conjure-up will tag your maas machines which is what you see at the top of that bundle I pasted. If you click on those machines in your MAAS web ui do those machines have those tags listed?
I apologize for the complications with MAAS, we are working hard to make that experience a lot better.
Thanks for your help :) No apologies needed - I like these systems and will contribute what I can... Even if only bug reports and testing...
I think your constraints worked much better than the defaults. I still run into one machine not being placed, which is a little odd. It seems like the juju controller is bootstrapped on the machine in question... might this be to do with the colocation issues?
2 down pending xenial cannot run instances: cannot run instance: No available machine matches constraints: [('agent_name', ['046b8d50-b48e-42db-8969-cbb602527fea']), ('tags', [
'7hmwxn']), ('zone', ['default'])] (resolved to "tags=7hmwxn zone=default")
The machine with the tag 7hmwxn shows as already deployed in MAAS.
As we have enough machines to overcome this, I'm going to try tagging all our machines eg "small" and "large" and just see if I can get it up and running like that, even if it uses more machines than strictly needed. ...
Now I've tried this, I have 7 machines tagged as "small" and 3 tagged as "huge". Swapping the tags constraints for just "tags=small" and "tags=huge" resulted in a successful deployment using "juju deploy --debug ./bundle.yaml" :)
It has used all the machines and appears to have made a slightly different result to the conjure-up way. I notice that doing it using juju deploy has not placed flannel on anything.
I'm still waiting for what looks like a very final step to complete:
Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.10.10.11 Certificate Authority connected.
etcd/0 active idle 1 10.10.10.12 2379/tcp Healthy with 3 known peers
etcd/1* active idle 2 10.10.10.13 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 10.10.10.14 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 10.10.10.15 443/tcp Loadbalancer ready.
kubernetes-master/0* waiting idle 5 10.10.10.16 6443/tcp Waiting for kube-system pods to start
kubernetes-worker/0* waiting idle 6 10.10.10.17 Waiting for kube-proxy to start.
kubernetes-worker/1 waiting idle 7 10.10.10.18 Waiting for kube-proxy to start.
kubernetes-worker/2 waiting idle 8 10.10.10.19 Waiting for kube-proxy to start.
I'm available for testing for the rest of this week, I am just taking my first steps with a real kubernetes cluster and expect to refine things a little as I go.
Thanks again :)
Aha! https://api.jujucharms.com/charmstore/v5/canonical-kubernetes/archive/bundle.yaml
I just added the flannel descriptions and the relations shown in the above bundle and re-ran juju deploy. It's added the flannel parts I expected to see (just from my previous experiments)...
IT WORKS :D
I'm going to leave this bug open so that we track the progress of making the placement editor a lot better
Ok, thanks. Please let me know if you need further testing...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This situation seems fairly nonsensical. I have MAAS and would like to deploy kubernetes as per the canonical kubernetes spell.
I have several small servers with 4 cores, 3 large servers with 16 cores.
Conjure-up seems to run fine and I can start the deployment, it will deploy easyrsa etc. Kubernetes worker always gets stuck with 'waiting for machine'.
During conjure-up, I'm choosing to pin the juju machines to particular boxes, this doesn't seem to be honoured during the deployment.
I am new to all this so debugging has been slow, but in my googling I found I could see the output of 'juju status', but the messages I see make little sense, as our large servers have greater than the constraints in CPU cores and RAM, yet "it" seems to be avoiding placing kubernetes worker on these and instead choosing the smaller servers then running out of them.
It appears like the constraints are failing for only our large servers. Or, I'm doing something wrong when pinning the machines?? There seems to be something funny about the menu selecting which juju machine to place on a physical machine.
sudo snap refresh conjure-up --edge
? Yes, this error is reproducible with latest conjure-up.Please provide the output of the following commands
Please attach tarball of ~/.cache/conjure-up: conjure-up.tar.gz
Sosreport
Please attach a sosreport:
What Spell was Selected?
canonical-kubernetes
What provider (aws, maas, localhost, etc)?
MAAS
MAAS Users
Which version of MAAS? MAAS version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)
Commands ran
Please outline what commands were run to install and execute conjure-up: Initially I think I started out with this guide https://tutorials.ubuntu.com/tutorial/install-kubernetes-with-conjure-up
so:
Additional Information
I saw another issue about colocation not working correctly - this also happens to our setup, we see more machines provisioned than we expected.