ELB are OutOfService - tectonic-1.7.1-tectonic.2 #1761

Open brant4test opened 7 years ago

brant4test commented 7 years ago

Hi, Team, I installed tectonic-1.7.1-tectonic.2.tar.gz on AWS with Terraform without any errors. but cannot access console and $ kubectl cluster-info Unable to connect to the server: EOF

After checking the aws elb, I got these: 3 masters in Tectonic console ELB are all OutOfService.

After login master nodes, seems like no kube* running and one failed unit $ ssh core@master Last login: Thu Aug 24 03:53:05 UTC 2017 from master on pts/0 Container Linux by CoreOS stable (1465.6.0) Update Strategy: No Reboots Failed Units: 1 init-assets.service core@ip-10-0-42-30 ~ $ journalctl -u init-assets.service -- Logs begin at Thu 2017-08-24 02:46:58 UTC, end at Thu 2017-08-24 03:54:45 UTC. -- Aug 24 02:47:06 localhost systemd[1]: Starting Download Tectonic Assets... Aug 24 02:47:12 ip-10-0-42-30 bash[743]: pubkey: prefix: "" Aug 24 02:47:12 ip-10-0-42-30 bash[743]: key: "" Aug 24 02:47:12 ip-10-0-42-30 bash[743]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898 7B8F 72AB F5F6 799D 33BC Aug 24 02:47:12 ip-10-0-42-30 bash[743]: ACI Converter (ACI conversion signing key) Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Trusting "" for prefix "" without fingerprint review. Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Added key for prefix "" at "/etc/rkt/trustedkeys/prefix.d/ Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 0 B/473 B Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 473 B/473 B Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 473 B/473 B Aug 24 02:47:22 ip-10-0-42-30 bash[743]: run: Get Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Main process exited, code=exited, status=254/n/a Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: Failed to start Download Tectonic Assets. Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Unit entered failed state. Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Failed with result 'exit-code'.

ip-10-0-42-30 ~ # netstat -anp|grep 32002 ip-10-0-42-30 ~ # ps -ef|grep api root 1347 1321 0 03:38 pts/0 00:00:00 grep --colour=auto api ip-10-0-42-30 ~ # ps -ef|grep kube root 1484 1473 0 03:53 pts/0 00:00:00 grep --colour=auto kube

What did i miss? Any tips? Thanks! 3 x etcd 3 x master 4 x slaves

volumeMounts:\n            - mountPath: /host/opt/cni/bin\n              name: host-cni-bin\n            - mountPath: /host/etc/cni/net.d\n              name: cni-net-dir\n      volumes:\n        - name: var-run-calico\n          hostPath:\n            path: /var/run/calico\n        - name: host-cni-bin\n          hostPath:\n            path: ${host_cni_bin}\n        - name: cni-net-dir\n          hostPath:\n            path: /etc/kubernetes/cni/net.d\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: kube-calico\n  namespace: kube-system\n"
Plan: 180 to add, 0 to change, 0 to destroy.
$ terraform apply -var-file=build/${CLUSTER}/terraform.tfvars platforms/aws
$ cat build/company/terraform.tfvars 

tectonic_admin_email = "**********@*****"

tectonic_admin_password_hash = "$2a$10$L06I4aRCxIt.uFi7wu1Vp.UDVVzLggIExXM*****************"

// Instance size for the etcd node(s). Example: `t2.medium`. Read the [etcd recommended hardware] ( guide for best performance
tectonic_aws_etcd_ec2_type = "t2.medium"

// The amount of provisioned IOPS for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_size = "30"

// The type of volume for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_type = "gp2"

// If set to true, create public facing ingress resources (ELB, A-records).
// If set to false, a "private" cluster will be created with an internal ELB only.
tectonic_aws_external_vpc_public = true

// Instance size for the master node(s). Example: `t2.medium`.
tectonic_aws_master_ec2_type = "t2.large"

// The amount of provisioned IOPS for the root block device of master nodes.
tectonic_aws_master_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of master nodes.
tectonic_aws_master_root_volume_size = "30"

// The type of volume for the root block device of master nodes.
tectonic_aws_master_root_volume_type = "gp2"

// The target AWS region for the cluster.
tectonic_aws_region = "us-east-1"

// Name of an SSH key located within the AWS region. Example: coreos-user.
tectonic_aws_ssh_key = "company"

// Block of IP addresses used by the VPC.
// This should not overlap with any other networks, such as a private datacenter connected via Direct Connect.
tectonic_aws_vpc_cidr_block = ""

// Instance size for the worker node(s). Example: `t2.medium`.
tectonic_aws_worker_ec2_type = "r4.large"

// The amount of provisioned IOPS for the root block device of worker nodes.
tectonic_aws_worker_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of worker nodes.
tectonic_aws_worker_root_volume_size = "30"

// The type of volume for the root block device of worker nodes.
tectonic_aws_worker_root_volume_type = "gp2"

tectonic_base_domain = ""

// [ALPHA] If set to true, calico network policy support will be deployed.
// WARNING: Enabling an alpha feature means that future updates may become unsupported.
// This should only be enabled on clusters that are meant to be short-lived to begin validating the alpha feature.
tectonic_calico_network_policy = false

// This declares the IP range to assign Kubernetes pod IPs in CIDR notation.
tectonic_cluster_cidr = ""

tectonic_cluster_name = "company"

// The number of etcd nodes to be created.
// If set to zero, the count of etcd nodes will be determined automatically.
// Note: This is currently only supported on AWS.
tectonic_etcd_count = "3"

// If set to true, experimental Tectonic assets are being deployed.
tectonic_experimental = false

tectonic_license_path = "/home/ubuntu/tectonic1.7/build/company/tectonic-license.txt"

tectonic_master_count = "3"

tectonic_pull_secret_path = "/home/ubuntu/tectonic1.7/build/company/config.json"

// This declares the IP range to assign Kubernetes service cluster IPs in CIDR notation. The maximum size of this IP range is /12
tectonic_service_cidr = ""

// The Tectonic statistics collection URL to which to report.
tectonic_stats_url = ""

// If set to true, a vanilla Kubernetes cluster will be deployed, omitting any Tectonic assets.
tectonic_vanilla_k8s = false

tectonic_worker_count = "4"
brant4test commented 7 years ago

Then I tried to Install tectonic-1.7.1-tectonic.2 on AWS with Tectonic Installer, stuck at the last step of "Starting Tectonic console" for more than 4 hours. Then I decided to Destroy Custer.

Before destroying stack, I also login and had a little check on master node. The port 32002 is still not listening. Same elb issue.

So my question is, is tectonic-1.7.1-tectonic.2 deploy-able at all?

I'll try to evaluate the earlier version 1.6.8 for the last time.

ip-10-0-22-13 ~ # ps -ef|grep api
root      1742  1726  0 05:47 ?        00:00:00 /usr/bin/flock /var/lock/api-server.lock /hyperkube apiserver --bind-address= --secure-port=443 --insecure-port=0 --advertise-address= --etcd-servers=,, --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range= --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/ --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url= --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      3419  1742  2 05:50 ?        00:01:41 /hyperkube apiserver --bind-address= --secure-port=443 --insecure-port=0 --advertise-address= --etcd-servers=,, --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range= --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/ --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url= --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      5203  5088  0 07:14 pts/1    00:00:00 grep --colour=auto api
ip-10-0-22-13 ~ # ps -ef|grep kube
root      1273     1  3 05:45 ?        00:02:56 /kubelet --kubeconfig=/etc/kubernetes/kubeconfig --require-kubeconfig --cni-conf-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --lock-file=/var/run/lock/kubelet.lock --exit-on-lock-contention --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged --minimum-container-ttl-duration=6m0s --cluster-dns= --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/ca.crt --anonymous-auth=false --cloud-provider=aws
ip-10-0-22-13 ~ # 
ip-10-0-22-13 ~ # ps -ef|grep 32002
root      5730  5088  0 07:15 pts/1    00:00:00 grep --colour=auto 32002
ip-10-0-22-13 ~ # netstat -anp|grep 32002



snsumner commented 7 years ago

I believe this might be related to this issue:

erie149 commented 7 years ago

I have same issue (ELB not passing health checks on 32002) with installing 1.6.10 installer and 1.7.3 installer. In fact i ssh to master node and port 32002 is not listening for any requests. The kubernetes master services were running (api-controller, proxy and scheduler) via a hyper kube docker container but was not able to make a connection to 32002. Also my installation is with a new VPC

tomdavidson commented 6 years ago

I am experiencing the same with 1.7.5 and with a new VPC given this case and @erie149's it might not be related to subnet tagging of existing #1786

In my case the variable that I can control is the base domain name:

domain name gui tf works works works broken

The R53 hosted zone, has a recordset 'sub' of NS records to:

The public hosted zone, has four records, the SOA and NS as well as:

name type value
dev-api A ELB
dev A ELB

The private hosted zone, has seven records, the SOA and NS (different than above) and:

name type value
_etcd-client-ssl._tcp SRV 0 0 2379
_etcd-server-ssl._tcp SRV 0 0 2380
dev-api A ELB
dev A ELB
dev-etcd-0 A 10.0.x.x

All three load balancers' have 0 instances in service and the instances are 'OutOfService' just as described by @brant4test :

$ kubectl cluster-info
Kubernetes master is running at

$ kubectl describe svc
Unable to connect to the server: EOF

$ kubectl cluster-info dump
Unable to connect to the server: EOF
tomdavidson commented 6 years ago

2243 is my problem - tectonic services are failing to install.