coreos / tectonic-installer

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more
Apache License 2.0
602 stars 266 forks source link

Bootkube failed on startup with status=254 during install on AWS using existing VPC #1552

Closed snsumner closed 7 years ago

snsumner commented 7 years ago

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

What happened?

During installation the bootkube service stopped with status=254 with the following log information:

-- Logs begin at Tue 2017-08-01 19:29:47 UTC, end at Tue 2017-08-01 19:48:13 UTC. --
Aug 01 19:30:22 ip-10-84-28-68 systemd[1]: Starting Bootstrap a Kubernetes cluster...
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]: pubkey: prefix: "quay.io/coreos/bootkube"
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]: key: "https://quay.io/aci-signing-key"
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]: gpg key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]:         Quay.io ACI Converter (ACI conversion signing key) <support@quay.io>
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]: Trusting "https://quay.io/aci-signing-key" for prefix "quay.io/coreos/bootkube" without fingerprint review.
Aug 01 19:30:27 ip-10-84-28-68 bash[1107]: Added key for prefix "quay.io/coreos/bootkube" at "/etc/rkt/trustedkeys/prefix.d/quay.io/coreos/bootkube/bff313cdaa5
Aug 01 19:30:28 ip-10-84-28-68 bash[1107]: Downloading signature:  0 B/473 B
Aug 01 19:30:28 ip-10-84-28-68 bash[1107]: Downloading signature:  473 B/473 B
Aug 01 19:30:28 ip-10-84-28-68 bash[1107]: Downloading signature:  473 B/473 B
Aug 01 19:30:29 ip-10-84-28-68 bash[1107]: run: bad HTTP status code: 500
Aug 01 19:30:29 ip-10-84-28-68 systemd[1]: bootkube.service: Main process exited, code=exited, status=254/n/a
Aug 01 19:30:29 ip-10-84-28-68 systemd[1]: Failed to start Bootstrap a Kubernetes cluster.

We manually restarted the bootkube service on the master node and it did complete the installation successfully.

What you expected to happen?

We expected the bootkube to finish the installation then stop.

How to reproduce it (as minimally and precisely as possible)?

The customer was installing Tectonic on AWS using their own VPC. Below is the scrubbed terraform.tfvars that was used:

// The e-mail address used to login as the admin user to the Tectonic Console.
// 
// Note: This field MUST be set manually prior to creating the cluster.
tectonic_admin_email = "cobra@company.com"

// The bcrypt hash of admin user password to login to the Tectonic Console.
// Use the bcrypt-hash tool (https://github.com/coreos/bcrypt-tool/releases/tag/v1.0.0) to generate it.
// 
// Note: This field MUST be set manually prior to creating the cluster.

// (optional) Extra AWS tags to be applied to created autoscaling group resources.
// This is a list of maps having the keys `key`, `value` and `propagate_at_launch`.
// 
// Example: `[ { key = "foo", value = "bar", propagate_at_launch = true } ]`
//tectonic_autoscaling_group_extra_tags=[{key="createdBy",value="scott.sumner",propagate_at_launch=true},{key="expirationDate",value="2017-12-31",propagate_at_launch=true}]
tectonic_autoscaling_group_extra_tags = [ { key = "billingcode", value = "ops", propagate_at_launch = true } ]

// Instance size for the etcd node(s). Example: `t2.medium`. Read the [etcd recommended hardware] (https://coreos.com/etcd/docs/latest/op-guide/hardware.html) guide for best performance
tectonic_aws_etcd_ec2_type = "m3.large"

// (optional) List of additional security group IDs for etcd nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_etcd_extra_sg_ids = ""

// The amount of provisioned IOPS for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_size = "30"

// The type of volume for the root block device of etcd nodes.
tectonic_aws_etcd_root_volume_type = "gp2"

// (optional) List of subnet IDs within an existing VPC to deploy master nodes into.
// Required to use an existing VPC and the list must match the AZ count.
// 
// Example: `["subnet-111111", "subnet-222222", "subnet-333333"]`
tectonic_aws_external_master_subnet_ids = ["subnet-14965a62", "subnet-30bb1054"]

// (optional) If set, the given Route53 zone ID will be used as the internal (private) zone.
// This zone will be used to create etcd DNS records as well as internal API and internal Ingress records.
// If set, no additional private zone will be created.
// 
// Example: `"Z1ILINNUJGTAO1"`
tectonic_aws_external_private_zone = "set"

// (optional) ID of an existing VPC to launch nodes into.
// If unset a new VPC is created.
// 
// Example: `vpc-123456`
tectonic_aws_external_vpc_id = "vpc-ff85299b"

// If set to true, create public facing ingress resources (ELB, A-records).
// If set to false, a "private" cluster will be created with an internal ELB only.
tectonic_aws_external_vpc_public = false

// (optional) List of subnet IDs within an existing VPC to deploy worker nodes into.
// Required to use an existing VPC and the list must match the AZ count.
// 
// Example: `["subnet-111111", "subnet-222222", "subnet-333333"]`
tectonic_aws_external_worker_subnet_ids = ["subnet-14965a62", "subnet-30bb1054"]

// (optional) Extra AWS tags to be applied to created resources.
tectonic_aws_extra_tags = {billingcode = "ops"}

// (optional) This configures master availability zones and their corresponding subnet CIDRs directly.
// 
// Example:
// `{ eu-west-1a = "10.0.0.0/20", eu-west-1b = "10.0.16.0/20" }`
// tectonic_aws_master_custom_subnets = ""

// Instance size for the master node(s). Example: `t2.medium`.
tectonic_aws_master_ec2_type = "m3.large"

// (optional) List of additional security group IDs for master nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_master_extra_sg_ids = ""

// (optional) Name of IAM role to use for the instance profiles of master nodes.
// The name is also the last part of a role's ARN.
// 
// Example:
//  * Role ARN  = arn:aws:iam::123456789012:role/tectonic-installer
//  * Role Name = tectonic-installer
// tectonic_aws_master_iam_role_name = ""

// The amount of provisioned IOPS for the root block device of master nodes.
tectonic_aws_master_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of master nodes.
tectonic_aws_master_root_volume_size = "30"

// The type of volume for the root block device of master nodes.
tectonic_aws_master_root_volume_type = "gp2"

// The target AWS region for the cluster.
tectonic_aws_region = "us-west-2"

// Name of an SSH key located within the AWS region. Example: coreos-user.
tectonic_aws_ssh_key = "ops"

// Block of IP addresses used by the VPC.
// This should not overlap with any other networks, such as a private datacenter connected via Direct Connect.
tectonic_aws_vpc_cidr_block = "10.84.0.0/16"

// (optional) This configures worker availability zones and their corresponding subnet CIDRs directly.
// 
// Example: `{ eu-west-1a = "10.0.64.0/20", eu-west-1b = "10.0.80.0/20" }`
// tectonic_aws_worker_custom_subnets = ""

// Instance size for the worker node(s). Example: `t2.medium`.
tectonic_aws_worker_ec2_type = "m3.large"

// (optional) List of additional security group IDs for worker nodes.
// 
// Example: `["sg-51530134", "sg-b253d7cc"]`
// tectonic_aws_worker_extra_sg_ids = ""

// (optional) Name of IAM role to use for the instance profiles of worker nodes.
// The name is also the last part of a role's ARN.
// 
// Example:
//  * Role ARN  = arn:aws:iam::123456789012:role/tectonic-installer
//  * Role Name = tectonic-installer
// tectonic_aws_worker_iam_role_name = ""

// The amount of provisioned IOPS for the root block device of worker nodes.
tectonic_aws_worker_root_volume_iops = "100"

// The size of the volume in gigabytes for the root block device of worker nodes.
tectonic_aws_worker_root_volume_size = "30"

// The type of volume for the root block device of worker nodes.
tectonic_aws_worker_root_volume_type = "gp2"

// The base DNS domain of the cluster.
// 
// Example: `openstack.dev.coreos.systems`.
// 
// Note: This field MUST be set manually prior to creating the cluster.
// This applies only to cloud platforms.
// private name base domain
tectonic_base_domain = "company.io"

// (optional) The content of the PEM-encoded CA certificate, used to generate Tectonic Console's server certificate.
// If left blank, a CA certificate will be automatically generated.
// tectonic_ca_cert = ""

// (optional) The content of the PEM-encoded CA key, used to generate Tectonic Console's server certificate.
// This field is mandatory if `tectonic_ca_cert` is set.
// tectonic_ca_key = ""

// (optional) The algorithm used to generate tectonic_ca_key.
// The default value is currently recommend.
// This field is mandatory if `tectonic_ca_cert` is set.
// tectonic_ca_key_alg = "RSA"

// The Container Linux update channel.
// 
// Examples: `stable`, `beta`, `alpha`
tectonic_cl_channel = "stable"

// This declares the IP range to assign Kubernetes pod IPs in CIDR notation.
tectonic_cluster_cidr = "10.2.0.0/16"

// The name of the cluster.
// If used in a cloud-environment, this will be prepended to `tectonic_base_domain` resulting in the URL to the Tectonic console.
// 
// Note: This field MUST be set manually prior to creating the cluster.
// Warning: Special characters in the name like '.' may cause errors on OpenStack platforms due to resource name constraints.
tectonic_cluster_name = "eval"

// (optional) DNS prefix used to construct the console and API server endpoints.
// tectonic_dns_name = ""

// (optional) The path of the file containing the CA certificate for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_client_cert_path`, and `tectonic_etcd_client_key_path` must also be set.
// tectonic_etcd_ca_cert_path = "/dev/null"

// (optional) The path of the file containing the client certificate for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_ca_cert_path`, and `tectonic_etcd_client_key_path` must also be set.
// tectonic_etcd_client_cert_path = "/dev/null"

// (optional) The path of the file containing the client key for TLS communication with etcd.
// 
// Note: This works only when used in conjunction with an external etcd cluster.
// If set, the variables `tectonic_etcd_servers`, `tectonic_etcd_ca_cert_path`, and `tectonic_etcd_client_cert_path` must also be set.
// tectonic_etcd_client_key_path = "/dev/null"

// The number of etcd nodes to be created.
// If set to zero, the count of etcd nodes will be determined automatically.
// 
// Note: This is currently only supported on AWS.
tectonic_etcd_count = "3"

// (optional) List of external etcd v3 servers to connect with (hostnames/IPs only).
// Needs to be set if using an external etcd cluster.
// 
// Example: `["etcd1", "etcd2", "etcd3"]`
// tectonic_etcd_servers = ""

// (optional) If set to `true`, TLS secure communication for self-provisioned etcd. will be used.
// 
// Note: If `tectonic_experimental` is set to `true` this variable has no effect, because the experimental self-hosted etcd always uses TLS.
// tectonic_etcd_tls_enabled = true

// If set to true, experimental Tectonic assets are being deployed.
tectonic_experimental = false

// The path to the tectonic licence file.
// 
// Note: This field MUST be set manually prior to creating the cluster unless `tectonic_vanilla_k8s` is set to `true`.
tectonic_license_path = "./build/eval/tectonic-license.txt"

// The number of master nodes to be created.
// This applies only to cloud platforms.
tectonic_master_count = "2"

// The path the pull secret file in JSON format.
// 
// Note: This field MUST be set manually prior to creating the cluster unless `tectonic_vanilla_k8s` is set to `true`.
tectonic_pull_secret_path = "./build/eval/config.json"

// This declares the IP range to assign Kubernetes service cluster IPs in CIDR notation. The maximum size of this IP range is /12
tectonic_service_cidr = "10.3.0.0/16"

// The Tectonic statistics collection URL to which to report.
tectonic_stats_url = "https://stats-collector.tectonic.com"

// If set to true, a vanilla Kubernetes cluster will be deployed, omitting any Tectonic assets.
tectonic_vanilla_k8s = false

// The number of worker nodes to be created.
// This applies only to cloud platforms.
tectonic_worker_count = "4"

Anything else we need to know?

We did not rerun the terraform install to see if the problem occurred again repetitively. We simple restart bootkube let is finish the installation so customer could do their evaluation of Tectonic.

robszumski commented 7 years ago

It looks like the container download was interrupted during the bootstrap process:

run: bad HTTP status code: 500

Re-running the bootstrap process should have resolved the issue.

snsumner commented 7 years ago

We got this working by simply restarting the bootkube service. We believe this error was caused due to a communication issue during download.