Test CloudMan 2 on all cloud providers

afgane commented 5 years ago

[x] AWS
[ ] Azure
[ ] GCP
[ ] Jetstream
[x] NeCTAR

afgane commented 5 years ago

On GCP, the tiller pod does not start. The status for the pod started by CloudMan-boot reports

0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

If I manually docker exec into the rancher container and run:

helm init
kubectl get pods -n kube-system
kubectl describe pod tiller-deploy-7b5bd84d84-vc9wj -n kube-system

it reports

...
vents:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  25s (x21 over 5m2s)  default-scheduler  no nodes available to schedule pods

A larger node does not help. Running kubectl get nodes -owide, returns No resources found., but the same happens on AWS (where the cluster is operational).

nuwang commented 5 years ago

The specific taint on the node is node.kubernetes.io/network-unavailable. The taint is being applied when we activate the gcp cloud provider in k8s. One suspect is the host name as described here: https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#gce Another could be a firewall issue perhaps? Yet a third could be a missing IAM policy. It should be noted that storage provisioning works though.

If we disable the cloud provider altogether it works but then the storage provider doesn't. We can also manually untaint the node as a temporary workaround.

nuwang commented 5 years ago

The GCP issue should be fixed with this: https://github.com/CloudVE/cloudman-boot/commit/7861b318b2216528c924b78d90b96bd5e903932b

CloudVE / galaxy-helm

Test CloudMan 2 on all cloud providers #42