Open afgane opened 5 years ago
On GCP, the tiller pod does not start. The status for the pod started by CloudMan-boot reports
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
If I manually docker exec
into the rancher container and run:
helm init
kubectl get pods -n kube-system
kubectl describe pod tiller-deploy-7b5bd84d84-vc9wj -n kube-system
it reports
...
vents:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25s (x21 over 5m2s) default-scheduler no nodes available to schedule pods
A larger node does not help. Running kubectl get nodes -owide
, returns No resources found.
, but the same happens on AWS (where the cluster is operational).
The specific taint on the node is node.kubernetes.io/network-unavailable
. The taint is being applied when we activate the gcp cloud provider in k8s. One suspect is the host name as described here: https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#gce
Another could be a firewall issue perhaps? Yet a third could be a missing IAM policy. It should be noted that storage provisioning works though.
If we disable the cloud provider altogether it works but then the storage provider doesn't. We can also manually untaint the node as a temporary workaround.
The GCP issue should be fixed with this: https://github.com/CloudVE/cloudman-boot/commit/7861b318b2216528c924b78d90b96bd5e903932b