coreos / tectonic-installer

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more
Apache License 2.0
601 stars 266 forks source link

ELB are OutOfService - tectonic-1.7.1-tectonic.2 #1761

Open brant4test opened 7 years ago

brant4test commented 7 years ago

Hi, Team, I installed tectonic-1.7.1-tectonic.2.tar.gz on AWS with Terraform without any errors. but cannot access console and $ kubectl cluster-info Unable to connect to the server: EOF

After checking the aws elb, I got these: 3 masters in Tectonic console ELB are all OutOfService.

After login master nodes, seems like no kube* running and one failed unit $ ssh core@master Last login: Thu Aug 24 03:53:05 UTC 2017 from master on pts/0 Container Linux by CoreOS stable (1465.6.0) Update Strategy: No Reboots Failed Units: 1 init-assets.service core@ip-10-0-42-30 ~ $ journalctl -u init-assets.service -- Logs begin at Thu 2017-08-24 02:46:58 UTC, end at Thu 2017-08-24 03:54:45 UTC. -- Aug 24 02:47:06 localhost systemd[1]: Starting Download Tectonic Assets... Aug 24 02:47:12 ip-10-0-42-30 bash[743]: pubkey: prefix: "quay.io/coreos/awscli" Aug 24 02:47:12 ip-10-0-42-30 bash[743]: key: "https://quay.io/aci-signing-key" Aug 24 02:47:12 ip-10-0-42-30 bash[743]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898 7B8F 72AB F5F6 799D 33BC Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Quay.io ACI Converter (ACI conversion signing key) support@quay.io Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/awscli" without fingerprint review. Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Added key for prefix "quay.io/coreos/awscli" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/awscli/bff313cdaa560b16a8987b8f72abf5f Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 0 B/473 B Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 473 B/473 B Aug 24 02:47:12 ip-10-0-42-30 bash[743]: Downloading signature: 473 B/473 B Aug 24 02:47:22 ip-10-0-42-30 bash[743]: run: Get https://quay-registry.s3.amazonaws.com/sharedimages/3d9c65f1-d97d-4a81-8318-226dd41b9a75/layer?Signature=IwzGK5LfPeMm8BzmnJ Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Main process exited, code=exited, status=254/n/a Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: Failed to start Download Tectonic Assets. Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Unit entered failed state. Aug 24 02:47:22 ip-10-0-42-30 systemd[1]: init-assets.service: Failed with result 'exit-code'.

ip-10-0-42-30 ~ # netstat -anp|grep 32002 ip-10-0-42-30 ~ # ps -ef|grep api root 1347 1321 0 03:38 pts/0 00:00:00 grep --colour=auto api ip-10-0-42-30 ~ # ps -ef|grep kube root 1484 1473 0 03:53 pts/0 00:00:00 grep --colour=auto kube

What did i miss? Any tips? Thanks! 3 x etcd 3 x master 4 x slaves

FYI, What i've done $ terraform plan -var-file=build/${CLUSTER}/terraform.tfvars platforms/aws

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.ignition_systemd_unit.etcd_unzip_tls: Refreshing state...
data.ignition_file.detect-master: Refreshing state...
data.ignition_file.max-user-watches: Refreshing state...
data.ignition_systemd_unit.locksmithd: Refreshing state...
data.ignition_systemd_unit.docker: Refreshing state...
…
volumeMounts:\n            - mountPath: /host/opt/cni/bin\n              name: host-cni-bin\n            - mountPath: /host/etc/cni/net.d\n              name: cni-net-dir\n      volumes:\n        - name: var-run-calico\n          hostPath:\n            path: /var/run/calico\n        - name: host-cni-bin\n          hostPath:\n            path: ${host_cni_bin}\n        - name: cni-net-dir\n          hostPath:\n            path: /etc/kubernetes/cni/net.d\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: kube-calico\n  namespace: kube-system\n"
    vars.%:   "<computed>"

Plan: 180 to add, 0 to change, 0 to destroy.
$ terraform apply -var-file=build/${CLUSTER}/terraform.tfvars platforms/aws
data.ignition_file.detect-master: Refreshing state...
data.ignition_systemd_unit.locksmithd: Refreshing state...
data.ignition_systemd_unit.docker: Refreshing state...
…
  vpc_zone_identifier.3627125697: "" => "subnet-739c5317"
  wait_for_capacity_timeout:      "" => "10m"
module.workers.aws_autoscaling_group.workers: Still creating... (40s elapsed)
module.etcd.aws_route53_record.etcd_srv_discover: Still creating... (20s elapsed)
module.etcd.aws_route53_record.etcd_srv_client: Still creating... (20s elapsed)
module.masters.aws_autoscaling_group.masters: Still creating... (10s elapsed)
module.workers.aws_autoscaling_group.workers: Creation complete (ID: platform-workers)
module.etcd.aws_route53_record.etcd_srv_client: Still creating... (30s elapsed)
module.etcd.aws_route53_record.etcd_srv_discover: Still creating... (30s elapsed)
module.masters.aws_autoscaling_group.masters: Still creating... (20s elapsed)
module.etcd.aws_route53_record.etcd_srv_discover: Still creating... (40s elapsed)
module.etcd.aws_route53_record.etcd_srv_client: Still creating... (40s elapsed)
module.masters.aws_autoscaling_group.masters: Still creating... (30s elapsed)
module.etcd.aws_route53_record.etcd_srv_client: Still creating... (50s elapsed)
module.etcd.aws_route53_record.etcd_srv_discover: Still creating... (50s elapsed)
module.masters.aws_autoscaling_group.masters: Still creating... (40s elapsed)
module.etcd.aws_route53_record.etcd_srv_client: Creation complete (ID: Z2NHIYEKX3MAST__etcd-client-ssl._tcp_SRV)
module.etcd.aws_route53_record.etcd_srv_discover: Creation complete (ID: Z2NHIYEKX3MAST__etcd-server-ssl._tcp_SRV)
module.masters.aws_autoscaling_group.masters: Still creating... (50s elapsed)
module.masters.aws_autoscaling_group.masters: Creation complete (ID: platform-masters)

Apply complete! Resources: 180 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path:

image

$ cat build/company/terraform.tfvars 

// The e-mail address used to:
// 1. login as the admin user to the Tectonic Console.
// 2. generate DNS zones for some providers.
// 
// Note: This field MUST be set manually prior to creating the cluster.
tectonic_admin_email = "**********@*****"

// The bcrypt hash of admin user password to login to the Tectonic Console.
// Use the bcrypt-hash tool (https://github.com/coreos/bcrypt-tool/releases/tag/v1.0.0) to generate it.
// 
// Note: This field MUST be set manually prior to creating the cluster.
tectonic_admin_password_hash = "$2a$10$L06I4aRCxIt.uFi7wu1Vp.UDVVzLggIExXM*****************"

// (optional) Extra AWS tags to be applied to created autoscaling group resources.
// This is a list of maps having the keys `key`, `value` and `propagate_at_launch`.
// 
// Example: `[ { key = "foo", value = "bar", propagate_at_launch = true } ]`
// tectonic_autoscaling_group_extra_tags = ""

// Instance size for the etcd node(s). Example: `t2.medium`. Read the [etcd recommended hardware] (https://coreos.com/etcd/docs/latest/op-guide/hardware.html) guide for best performance
tectonic_aws_etcd_ec2_type = "t2.medium"

// (optional) List of additional security group IDs for etcd nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_etcd_extra_sg_ids = ""

// The amount of provisioned IOPS for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_size = "30"

// The type of volume for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_type = "gp2"

// (optional) List of subnet IDs within an existing VPC to deploy master nodes into.
// Required to use an existing VPC and the list must match the AZ count.
// 
// Example: `["subnet-111111", "subnet-222222", "subnet-333333"]`
// tectonic_aws_external_master_subnet_ids = ""

// (optional) If set, the given Route53 zone ID will be used as the internal (private) zone.
// This zone will be used to create etcd DNS records as well as internal API and internal Ingress records.
// If set, no additional private zone will be created.
// 
// Example: `"Z1ILINNUJGTAO1"`
// tectonic_aws_external_private_zone = ""

// (optional) ID of an existing VPC to launch nodes into.
// If unset a new VPC is created.
// 
// Example: `vpc-123456`
// tectonic_aws_external_vpc_id = ""

// If set to true, create public facing ingress resources (ELB, A-records).
// If set to false, a "private" cluster will be created with an internal ELB only.
tectonic_aws_external_vpc_public = true

// (optional) List of subnet IDs within an existing VPC to deploy worker nodes into.
// Required to use an existing VPC and the list must match the AZ count.
// 
// Example: `["subnet-111111", "subnet-222222", "subnet-333333"]`
// tectonic_aws_external_worker_subnet_ids = ""

// (optional) Extra AWS tags to be applied to created resources.
// tectonic_aws_extra_tags = ""

// (optional) This configures master availability zones and their corresponding subnet CIDRs directly.
// 
// Example:
// `{ eu-west-1a = "10.0.0.0/20", eu-west-1b = "10.0.16.0/20" }`
// tectonic_aws_master_custom_subnets = ""

// Instance size for the master node(s). Example: `t2.medium`.
tectonic_aws_master_ec2_type = "t2.large"

// (optional) List of additional security group IDs for master nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_master_extra_sg_ids = ""

// (optional) Name of IAM role to use for the instance profiles of master nodes.
// The name is also the last part of a role's ARN.
// 
// Example:
//  * Role ARN  = arn:aws:iam::123456789012:role/tectonic-installer
//  * Role Name = tectonic-installer
// tectonic_aws_master_iam_role_name = ""

// The amount of provisioned IOPS for the root block device of master nodes.
tectonic_aws_master_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of master nodes.
tectonic_aws_master_root_volume_size = "30"

// The type of volume for the root block device of master nodes.
tectonic_aws_master_root_volume_type = "gp2"

// The target AWS region for the cluster.
tectonic_aws_region = "us-east-1"

// Name of an SSH key located within the AWS region. Example: coreos-user.
tectonic_aws_ssh_key = "company"

// Block of IP addresses used by the VPC.
// This should not overlap with any other networks, such as a private datacenter connected via Direct Connect.
tectonic_aws_vpc_cidr_block = "10.0.0.0/16"

// (optional) This configures worker availability zones and their corresponding subnet CIDRs directly.
// 
// Example: `{ eu-west-1a = "10.0.64.0/20", eu-west-1b = "10.0.80.0/20" }`
// tectonic_aws_worker_custom_subnets = ""

// Instance size for the worker node(s). Example: `t2.medium`.
tectonic_aws_worker_ec2_type = "r4.large"

// (optional) List of additional security group IDs for worker nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_worker_extra_sg_ids = ""

// (optional) Name of IAM role to use for the instance profiles of worker nodes.
// The name is also the last part of a role's ARN.
// 
// Example:
//  * Role ARN  = arn:aws:iam::123456789012:role/tectonic-installer
//  * Role Name = tectonic-installer
// tectonic_aws_worker_iam_role_name = ""

// The amount of provisioned IOPS for the root block device of worker nodes.
tectonic_aws_worker_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of worker nodes.
tectonic_aws_worker_root_volume_size = "30"

// The type of volume for the root block device of worker nodes.
tectonic_aws_worker_root_volume_type = "gp2"

// The base DNS domain of the cluster. It must NOT contain a trailing period. Some
// DNS providers will automatically add this if necessary.
// 
// Example: `openstack.dev.coreos.systems`.
// 
// Note: This field MUST be set manually prior to creating the cluster.
// This applies only to cloud companys.
tectonic_base_domain = "company.com"

// (optional) The content of the PEM-encoded CA certificate, used to generate Tectonic Console's server certificate.
// If left blank, a CA certificate will be automatically generated.
// tectonic_ca_cert = ""

// (optional) The content of the PEM-encoded CA key, used to generate Tectonic Console's server certificate.
// This field is mandatory if `tectonic_ca_cert` is set.
// tectonic_ca_key = ""

// (optional) The algorithm used to generate tectonic_ca_key.
// The default value is currently recommend.
// This field is mandatory if `tectonic_ca_cert` is set.
// tectonic_ca_key_alg = "RSA"

// [ALPHA] If set to true, calico network policy support will be deployed.
// WARNING: Enabling an alpha feature means that future updates may become unsupported.
// This should only be enabled on clusters that are meant to be short-lived to begin validating the alpha feature.
tectonic_calico_network_policy = false

// The Container Linux update channel.
// 
// Examples: `stable`, `beta`, `alpha`
tectonic_cl_channel = "stable"

// This declares the IP range to assign Kubernetes pod IPs in CIDR notation.
tectonic_cluster_cidr = "10.2.0.0/16"

// The name of the cluster.
// If used in a cloud-environment, this will be prepended to `tectonic_base_domain` resulting in the URL to the Tectonic console.
// 
// Note: This field MUST be set manually prior to creating the cluster.
// Warning: Special characters in the name like '.' may cause errors on OpenStack companys due to resource name constraints.
tectonic_cluster_name = "company"

// (optional) This only applies if you use the modules/dns/ddns module.
// 
// Specifies the RFC2136 Dynamic DNS server key algorithm.
// tectonic_ddns_key_algorithm = ""

// (optional) This only applies if you use the modules/dns/ddns module.
// 
// Specifies the RFC2136 Dynamic DNS server key name.
// tectonic_ddns_key_name = ""

// (optional) This only applies if you use the modules/dns/ddns module.
// 
// Specifies the RFC2136 Dynamic DNS server key secret.
// tectonic_ddns_key_secret = ""

// (optional) This only applies if you use the modules/dns/ddns module.
// 
// Specifies the RFC2136 Dynamic DNS server IP/host to register IP addresses to.
// tectonic_ddns_server = ""

// (optional) DNS prefix used to construct the console and API server endpoints.
// tectonic_dns_name = ""

// (optional) The path of the file containing the CA certificate for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_client_cert_path`, and `tectonic_etcd_client_key_path` must also be set.
// tectonic_etcd_ca_cert_path = "/dev/null"

// (optional) The path of the file containing the client certificate for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_ca_cert_path`, and `tectonic_etcd_client_key_path` must also be set.
// tectonic_etcd_client_cert_path = "/dev/null"

// (optional) The path of the file containing the client key for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_ca_cert_path`, and `tectonic_etcd_client_cert_path` must also be set.
// tectonic_etcd_client_key_path = "/dev/null"

// The number of etcd nodes to be created.
// If set to zero, the count of etcd nodes will be determined automatically.
// 
// Note: This is currently only supported on AWS.
tectonic_etcd_count = "3"

// (optional) List of external etcd v3 servers to connect with (hostnames/IPs only).
// Needs to be set if using an external etcd cluster.
// 
// Example: `["etcd1", "etcd2", "etcd3"]`
// tectonic_etcd_servers = ""

// (optional) If set to `true`, TLS secure communication for self-provisioned etcd. will be used.
// 
// Note: If `tectonic_experimental` is set to `true` this variable has no effect, because the experimental self-hosted etcd always uses TLS.
// tectonic_etcd_tls_enabled = true

// If set to true, experimental Tectonic assets are being deployed.
tectonic_experimental = false

// The path to the tectonic licence file.
// 
// Note: This field MUST be set manually prior to creating the cluster unless `tectonic_vanilla_k8s` is set to `true`.
tectonic_license_path = "/home/ubuntu/tectonic1.7/build/company/tectonic-license.txt"

// The number of master nodes to be created.
// This applies only to cloud companys.
tectonic_master_count = "3"

// The path the pull secret file in JSON format.
// 
// Note: This field MUST be set manually prior to creating the cluster unless `tectonic_vanilla_k8s` is set to `true`.
tectonic_pull_secret_path = "/home/ubuntu/tectonic1.7/build/company/config.json"

// This declares the IP range to assign Kubernetes service cluster IPs in CIDR notation. The maximum size of this IP range is /12
tectonic_service_cidr = "10.3.0.0/16"

// The Tectonic statistics collection URL to which to report.
tectonic_stats_url = "https://stats-collector.tectonic.com"

// If set to true, a vanilla Kubernetes cluster will be deployed, omitting any Tectonic assets.
tectonic_vanilla_k8s = false

// The number of worker nodes to be created.
// This applies only to cloud companys.
tectonic_worker_count = "4"
brant4test commented 7 years ago

Then I tried to Install tectonic-1.7.1-tectonic.2 on AWS with Tectonic Installer, stuck at the last step of "Starting Tectonic console" for more than 4 hours. Then I decided to Destroy Custer.

Before destroying stack, I also login and had a little check on master node. The port 32002 is still not listening. Same elb issue.

So my question is, is tectonic-1.7.1-tectonic.2 deploy-able at all?

I'll try to evaluate the earlier version 1.6.8 for the last time.

ip-10-0-22-13 ~ # ps -ef|grep api
root      1742  1726  0 05:47 ?        00:00:00 /usr/bin/flock /var/lock/api-server.lock /hyperkube apiserver --bind-address=0.0.0.0 --secure-port=443 --insecure-port=0 --advertise-address=0.0.0.0 --etcd-servers=https://stack-etcd-0.company.com:2379,https://stack-etcd-1.company.com:2379,https://stack-etcd-2.company.com:2379 --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range=10.3.0.0/16 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/service-account.pub --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url=https://stack.company.com/identity --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      3419  1742  2 05:50 ?        00:01:41 /hyperkube apiserver --bind-address=0.0.0.0 --secure-port=443 --insecure-port=0 --advertise-address=0.0.0.0 --etcd-servers=https://stack-etcd-0.company.com:2379,https://stack-etcd-1.company.com:2379,https://stack-etcd-2.company.com:2379 --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range=10.3.0.0/16 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/service-account.pub --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url=https://stack.company.com/identity --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      5203  5088  0 07:14 pts/1    00:00:00 grep --colour=auto api
ip-10-0-22-13 ~ # ps -ef|grep kube
root      1273     1  3 05:45 ?        00:02:56 /kubelet --kubeconfig=/etc/kubernetes/kubeconfig --require-kubeconfig --cni-conf-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --lock-file=/var/run/lock/kubelet.lock --exit-on-lock-contention --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged --node-labels=node-role.kubernetes.io/master --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --minimum-container-ttl-duration=6m0s --cluster-dns=10.3.0.10 --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/ca.crt --anonymous-auth=false --cloud-provider=aws
root      1742  1726  0 05:47 ?        00:00:00 /usr/bin/flock /var/lock/api-server.lock /hyperkube apiserver --bind-address=0.0.0.0 --secure-port=443 --insecure-port=0 --advertise-address=0.0.0.0 --etcd-servers=https://stack-etcd-0.company.com:2379,https://stack-etcd-1.company.com:2379,https://stack-etcd-2.company.com:2379 --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range=10.3.0.0/16 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/service-account.pub --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url=https://stack.company.com/identity --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      1836  1817  0 05:47 ?        00:00:16 ./hyperkube proxy --kubeconfig=/etc/kubernetes/kubeconfig --proxy-mode=iptables --hostname-override=ip-10-0-22-13.ec2.internal --cluster-cidr=10.2.0.0/16
root      2053  2037  0 05:47 ?        00:00:02 /opt/bin/flanneld --ip-masq --kube-subnet-mgr --iface=10.0.22.13
nobody    2403  2386  0 05:48 ?        00:00:26 ./hyperkube scheduler --leader-elect=true
nobody    2573  2549  0 05:48 ?        00:00:01 ./hyperkube controller-manager --allocate-node-cidrs=true --configure-cloud-routes=false --cluster-cidr=10.2.0.0/16 --root-ca-file=/etc/kubernetes/secrets/ca.crt --service-account-private-key-file=/etc/kubernetes/secrets/service-account.key --leader-elect=true --node-monitor-grace-period=2m --pod-eviction-timeout=220s --cloud-provider=aws
root      2709  2693  0 05:48 ?        00:00:02 /kube-dns --domain=cluster.local. --dns-port=10053 --config-dir=/kube-dns-config --v=2
nobody    2890  2874  0 05:48 ?        00:00:04 /sidecar --v=2 --logtostderr --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
root      3201  3199  0 05:49 ?        00:00:01 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/usr/bin/systemd-nspawn --boot --notify-ready=yes -Zsystem_u:system_r:svirt_lxc_net_t:s0:c720,c764 -Lsystem_u:object_r:svirt_lxc_file_t:s0:c720,c764 --register=true --link-journal=try-guest --quiet --uuid=7bd366cc-507c-4a1c-961a-dc2151770cc2 --machine=rkt-7bd366cc-507c-4a1c-961a-dc2151770cc2 --directory=stage1/rootfs --bind=/opt/tectonic:/opt/stage2/hyperkube/rootfs/assets:rbind --capability=CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FSETID,CAP_FOWNER,CAP_KILL,CAP_MKNOD,CAP_NET_RAW,CAP_NET_BIND_SERVICE,CAP_SETUID,CAP_SETGID,CAP_SETPCAP,CAP_SETFCAP,CAP_SYS_CHROOT -- --default-standard-output=tty --log-target=null --show-status=0
root      3220  3213  0 05:49 ?        00:00:01 /bin/bash /assets/tectonic.sh /assets/auth/kubeconfig /assets false
root      3419  1742  2 05:50 ?        00:01:42 /hyperkube apiserver --bind-address=0.0.0.0 --secure-port=443 --insecure-port=0 --advertise-address=0.0.0.0 --etcd-servers=https://stack-etcd-0.company.com:2379,https://stack-etcd-1.company.com:2379,https://stack-etcd-2.company.com:2379 --etcd-cafile=/etc/kubernetes/secrets/etcd-client-ca.crt --etcd-certfile=/etc/kubernetes/secrets/etcd-client.crt --etcd-keyfile=/etc/kubernetes/secrets/etcd-client.key --etcd-quorum-read=true --storage-backend=etcd3 --allow-privileged=true --service-cluster-ip-range=10.3.0.0/16 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota --tls-ca-file=/etc/kubernetes/secrets/ca.crt --tls-cert-file=/etc/kubernetes/secrets/apiserver.crt --tls-private-key-file=/etc/kubernetes/secrets/apiserver.key --kubelet-client-certificate=/etc/kubernetes/secrets/apiserver.crt --kubelet-client-key=/etc/kubernetes/secrets/apiserver.key --service-account-key-file=/etc/kubernetes/secrets/service-account.pub --client-ca-file=/etc/kubernetes/secrets/ca.crt --authorization-mode=RBAC --anonymous-auth=false --oidc-issuer-url=https://stack.company.com/identity --oidc-client-id=tectonic-kubectl --oidc-username-claim=email --oidc-groups-claim=groups --oidc-ca-file=/etc/kubernetes/secrets/ca.crt --cloud-provider=aws --audit-log-path=/var/log/kubernetes/kube-apiserver-audit.log --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100
root      5345  5088  0 07:14 pts/1    00:00:00 grep --colour=auto kube
ip-10-0-22-13 ~ # 
ip-10-0-22-13 ~ # ps -ef|grep 32002
root      5730  5088  0 07:15 pts/1    00:00:00 grep --colour=auto 32002
ip-10-0-22-13 ~ # netstat -anp|grep 32002

image

image

snsumner commented 7 years ago

I believe this might be related to this issue: https://github.com/coreos/tectonic-installer/issues/1786

erie149 commented 7 years ago

I have same issue (ELB not passing health checks on 32002) with installing 1.6.10 installer and 1.7.3 installer. In fact i ssh to master node and port 32002 is not listening for any requests. The kubernetes master services were running (api-controller, proxy and scheduler) via a hyper kube docker container but was not able to make a connection to 32002. Also my installation is with a new VPC

tomdavidson commented 7 years ago

I am experiencing the same with 1.7.5 and with a new VPC given this case and @erie149's it might not be related to subnet tagging of existing #1786

In my case the variable that I can control is the base domain name:

domain name gui tf
domain.com works works
sub.domain.com works broken

The R53 hosted zone, domain.com has a recordset 'sub' of NS records to:

The public hosted zone, sub.domain.com has four records, the SOA and NS as well as:

name type value
dev-api A ELB
dev A ELB

The private hosted zone, sub.domain.com has seven records, the SOA and NS (different than above) and:

name type value
_etcd-client-ssl._tcp SRV 0 0 2379 dev-etcd-0.sub.domain.com
_etcd-server-ssl._tcp SRV 0 0 2380 dev-etcd-0.sub.domain.com
dev-api A ELB
dev A ELB
dev-etcd-0 A 10.0.x.x

All three load balancers' have 0 instances in service and the instances are 'OutOfService' just as described by @brant4test :

$ kubectl cluster-info
Kubernetes master is running at https://dev-api.sub.domain.com:443

$ kubectl describe svc
Unable to connect to the server: EOF

$ kubectl cluster-info dump
Unable to connect to the server: EOF
tomdavidson commented 7 years ago

2243 is my problem - tectonic services are failing to install.