coreos / terraform-aws-kubernetes

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more
Apache License 2.0
116 stars 67 forks source link

Cluster unreachable after create #18

Open pms1969 opened 6 years ago

pms1969 commented 6 years ago

I'm trying to set up a POC cluster for a bit of experimentation. Chose the vanilla method (see config at bottom);

At first, I had the same issue as #6. Running plan and apply again sorted it, but now the running kubectl cluster-info gives me:

$ kubectl cluster-info
Kubernetes master is running at https://mydomain.com:443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Unable to connect to the server: EOF

running with dump just returns

Unable to connect to the server: unexpected EOF

ssh'ing to the machine, hyperkube seemed to have restarted a few times. when it became stable, the logs were full of:

E0209 09:46:08.708370       1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://mydomain.com:443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: EOF

docker ps -a gives:

CONTAINER ID        IMAGE                                                            COMMAND                   CREATED              STATUS                            PORTS               N
AMES                                                                                                                                                                                       
9eecc428bfb4        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   About a minute ago   Exited (0) About a minute ago                         q
uirky_newton                                                                                                                                                                               
2b204d005d85        quay.io/coreos/hyperkube                                         "/usr/bin/flock /v..."    2 minutes ago        Exited (255) About a minute ago                       k
8s_kube-apiserver_bootstrap-kube-apiserver-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_7372638d060e8634f939e7d9638e7fb2_7                                                      
22ea417a2d6e        quay.io/coreos/hyperkube                                         "./hyperkube contr..."    20 minutes ago       Up 20 minutes                                         k
8s_kube-controller-manager_bootstrap-kube-controller-manager-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_593e5c19268a732b18cf733be361f7ef_0                                    
adbd68ad12c2        quay.io/coreos/hyperkube                                         "./hyperkube sched..."    20 minutes ago       Up 20 minutes                                         k
8s_kube-scheduler_bootstrap-kube-scheduler-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_9ed9a738aa21e46d4aa2be533a40fe37_0                                                      
67f6b3a5d35b        gcr.io/google_containers/pause-amd64:3.0                         "/pause"                  21 minutes ago       Up 21 minutes                                         k
8s_POD_bootstrap-kube-controller-manager-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_593e5c19268a732b18cf733be361f7ef_0                                                        
a253488550f7        gcr.io/google_containers/pause-amd64:3.0                         "/pause"                  21 minutes ago       Up 21 minutes                                         $
8s_POD_bootstrap-kube-scheduler-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_9ed9a738aa21e46d4aa2be533a40fe37_0                                                                 
0b6d728cbd27        gcr.io/google_containers/pause-amd64:3.0                         "/pause"                  21 minutes ago       Up 21 minutes                                         k
8s_POD_bootstrap-kube-apiserver-ip-10-102-39-231.eu-west-1.compute.internal_kube-system_7372638d060e8634f939e7d9638e7fb2_0                                                                 
fe3f47a56eb4        quay.io/coreos/bootkube:v0.8.1                                   "/bootkube start -..."    21 minutes ago       Exited (1) About a minute ago                         g
allant_noyce                                                                                                                                                                               
17fbff44936f        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   21 minutes ago       Exited (0) 21 minutes ago                             r
omantic_babbage                                                                                                                                                                            
3c0738a79899        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (0) 22 minutes ago                             s
illy_bhabha                                                                                                                                                                                
1998998cbb32        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (0) 23 minutes ago                             l
ucid_hopper                                                                                                                                                                                
1c1014cc46e4        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             v
igilant_swartz                                                                                                                                                                             
aac978cfe59b        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             p
riceless_curran                                                                                                                                                                            
6cd84c94bbe2        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             d
azzling_kowalevski                                                                                                                                                                         
c334f6942bdf        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             s
harp_ride                                                                                                                                                                                  
57351683101c        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             p
eaceful_bassi                                                                                                                                                                              
2067d4c77259        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   23 minutes ago       Exited (1) 23 minutes ago                             a
doring_wing                                                                                                                                                                                
ff330a2822ef        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/detect-master.sh"       23 minutes ago       Exited (0) 23 minutes ago                             h
ardcore_spence                                        

Getting the logs of the 2b204 container:

core@ip-10-102-39-231 ~ $ docker logs 2b204d005d85
I0209 09:50:03.083580       5 server.go:114] Version: v1.8.4+coreos.0
I0209 09:50:03.084027       5 cloudprovider.go:59] --external-hostname was not specified. Trying to get it from the cloud provider.
I0209 09:50:03.084119       5 aws.go:847] Building AWS cloudprovider
I0209 09:50:03.084176       5 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I0209 09:50:03.312930       5 tags.go:76] AWS cloud filtering on ClusterID: myclusterid
I0209 09:50:03.319499       5 aws.go:847] Building AWS cloudprovider
I0209 09:50:03.319593       5 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I0209 09:50:03.395295       5 tags.go:76] AWS cloud filtering on ClusterID: myclusterid
I0209 09:50:04.006413       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.007071       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
W0209 09:50:04.007977       5 admission.go:66] PersistentVolumeLabel admission controller is deprecated. Please remove this controller from your configuration files and scripts.
I0209 09:50:04.009754       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.013321       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.014111       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.014711       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.015425       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.016160       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.016988       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.017710       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.020156       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.022998       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.025027       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.026320       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.027219       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.028469       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.029282       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.030056       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.030701       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.031359       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.045911       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.046704       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.047484       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.048272       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.048979       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.049801       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.050593       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.051364       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.052163       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.052917       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.056419       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.058839       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.061129       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.064863       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.065823       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.066573       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.067287       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.068001       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.068732       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.069383       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.070143       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.070879       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.073015       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.073705       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.074413       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.075173       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.075909       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.076577       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.077304       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.078184       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.078976       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:04.079694       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
W0209 09:50:04.086404       5 genericapiserver.go:311] Skipping API batch/v2alpha1 because it has no resources.
W0209 09:50:04.098126       5 genericapiserver.go:311] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
[restful] 2018/02/09 09:50:04 log.go:33: [restful/swagger] listing is available at https://{public-ip}/swaggerapi
[restful] 2018/02/09 09:50:04 log.go:33: [restful/swagger] https://{public-ip}/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2018/02/09 09:50:04 log.go:33: [restful/swagger] listing is available at https://{public-ip}/swaggerapi
[restful] 2018/02/09 09:50:04 log.go:33: [restful/swagger] https://{public-ip}/swaggerui/ is mapped to folder /swagger-ui/
I0209 09:50:04.839076       5 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0209 09:50:07.354052       5 serve.go:85] Serving securely on 0.0.0.0:443
I0209 09:50:07.354245       5 apiservice_controller.go:112] Starting APIServiceRegistrationController
I0209 09:50:07.354268       5 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I0209 09:50:07.354290       5 controller.go:84] Starting OpenAPI AggregationController
I0209 09:50:07.354414       5 crd_finalizer.go:242] Starting CRDFinalizer
I0209 09:50:07.354436       5 available_controller.go:192] Starting AvailableConditionController
I0209 09:50:07.354446       5 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I0209 09:50:07.354479       5 crdregistration_controller.go:112] Starting crd-autoregister controller
I0209 09:50:07.354489       5 controller_utils.go:1041] Waiting for caches to sync for crd-autoregister controller
I0209 09:50:07.354502       5 customresource_discovery_controller.go:152] Starting DiscoveryController
I0209 09:50:07.354518       5 naming_controller.go:277] Starting NamingConditionController
I0209 09:50:33.470964       5 trace.go:76] Trace[1055602432]: "Create /api/v1/namespaces/kube-system/pods" (started: 2018-02-09 09:50:23.470487469 +0000 UTC) (total time: 10.00043381s):
Trace[1055602432]: [10.00043381s] [10.000316949s] END
I0209 09:50:38.355775       5 trace.go:76] Trace[2073516846]: "Create /api/v1/namespaces" (started: 2018-02-09 09:50:08.355292443 +0000 UTC) (total time: 30.000458182s):
Trace[2073516846]: [30.000458182s] [30.000379983s] END
E0209 09:50:38.356286       5 client_ca_hook.go:78] Timeout: request did not complete within allowed duration
I0209 09:50:45.212831       5 trace.go:76] Trace[916872897]: "Create /api/v1/nodes" (started: 2018-02-09 09:50:15.212253831 +0000 UTC) (total time: 30.000553967s):
Trace[916872897]: [30.000553967s] [30.000416454s] END
I0209 09:50:50.964274       5 trace.go:76] Trace[1282210152]: "Create /api/v1/nodes" (started: 2018-02-09 09:50:20.963753931 +0000 UTC) (total time: 30.000496712s):
Trace[1282210152]: [30.000496712s] [30.000346264s] END
I0209 09:50:51.218273       5 trace.go:76] Trace[2066235820]: "Create /api/v1/nodes" (started: 2018-02-09 09:50:21.217643067 +0000 UTC) (total time: 30.00059804s):
Trace[2066235820]: [30.00059804s] [30.000444768s] END
I0209 09:50:51.406412       5 trace.go:76] Trace[2013168133]: "Create /api/v1/nodes" (started: 2018-02-09 09:50:21.405856313 +0000 UTC) (total time: 30.000531953s):
Trace[2013168133]: [30.000531953s] [30.000399377s] END
E0209 09:51:07.370854       5 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Endpoints: the server was unable to return a response in the time allotted, but may still be processing the request (get endpoints)
E0209 09:51:07.371962       5 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/apiextensions-apiserver/pkg/client/informers/internalversion/factory.go:61: Failed to list *apiextensions.CustomResourceDefinition: the server was unable to return a response in the time allotted, but may still be processing the request (get customresourcedefinitions.apiextensions.k8s.io)
E0209 09:51:07.372021       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *rbac.ClusterRoleBinding: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterrolebindings.rbac.authorization.k8s.io)
E0209 09:51:07.372134       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *api.ResourceQuota: the server was unable to return a response in the time allotted, but may still be processing the request (get resourcequotas)
E0209 09:51:07.373788       5 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Service: the server was unable to return a response in the time allotted, but may still be processing the request (get services)
E0209 09:51:07.373796       5 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/client/informers/internalversion/factory.go:61: Failed to list *apiregistration.APIService: the server was unable to return a response in the time allotted, but may still be processing the request (get apiservices.apiregistration.k8s.io)
E0209 09:51:07.391639       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *storage.StorageClass: the server was unable to return a response in the time allotted, but may still be processing the request (get storageclasses.storage.k8s.io)
E0209 09:51:07.391741       5 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Namespace: the server was unable to return a response in the time allotted, but may still be processing the request (get namespaces)
E0209 09:51:07.391794       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *api.Secret: the server was unable to return a response in the time allotted, but may still be processing the request (get secrets)
E0209 09:51:07.391835       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *rbac.Role: the server was unable to return a response in the time allotted, but may still be processing the request (get roles.rbac.authorization.k8s.io)
E0209 09:51:07.392139       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *api.ServiceAccount: the server was unable to return a response in the time allotted, but may still be processing the request (get serviceaccounts)
E0209 09:51:07.392196       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *rbac.ClusterRole: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterroles.rbac.authorization.k8s.io)
E0209 09:51:07.392865       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *api.LimitRange: the server was unable to return a response in the time allotted, but may still be processing the request (get limitranges)
E0209 09:51:07.393203       5 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:73: Failed to list *rbac.RoleBinding: the server was unable to return a response in the time allotted, but may still be processing the request (get rolebindings.rbac.authorization.k8s.io)
E0209 09:51:08.355816       5 storage_rbac.go:172] unable to initialize clusterroles: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterroles.rbac.authorization.k8s.io)
I0209 09:51:08.357182       5 trace.go:76] Trace[1216105872]: "Create /api/v1/namespaces" (started: 2018-02-09 09:50:38.356848784 +0000 UTC) (total time: 30.000315483s):
Trace[1216105872]: [30.000315483s] [30.000273123s] END
E0209 09:51:08.357487       5 client_ca_hook.go:78] Timeout: request did not complete within allowed duration
F0209 09:51:08.357509       5 hooks.go:133] PostStartHook "ca-registration" failed: unable to initialize client CA configmap: timed out waiting for the condition
core@ip-10-102-39-231 ~ $

my config looks like this:

module "kubernetes" {
  source = "coreos/kubernetes/aws"

  tectonic_aws_assets_s3_bucket_name = "${var.s3_asset_bucket}"
  tectonic_aws_etcd_ec2_type = "t2.small"
  tectonic_aws_etcd_root_volume_iops = "100"
  tectonic_aws_etcd_root_volume_size = "30"
  tectonic_aws_etcd_root_volume_type = "gp2"
  tectonic_aws_external_private_zone = "${data.aws_route53_zone.private_zone.zone_id}"
  tectonic_aws_master_ec2_type = "t2.medium"
  tectonic_aws_master_root_volume_iops = "100"
  tectonic_aws_master_root_volume_size = "30"
  tectonic_aws_master_root_volume_type = "gp2"
  tectonic_aws_private_endpoints = true
  tectonic_aws_profile = "${var.aws_profile}"
  tectonic_aws_public_endpoints = true
  tectonic_aws_region = "${var.primary_region}"
  tectonic_aws_ssh_key = "${var.keypair_name}"
  tectonic_aws_vpc_cidr_block = "${data.external.cidr.result.value}"
  tectonic_aws_worker_ec2_type = "t2.medium"
  tectonic_aws_worker_root_volume_iops = "100"
  tectonic_aws_worker_root_volume_size = "30"
  tectonic_aws_worker_root_volume_type = "gp2"
  tectonic_base_domain = "${var.domain}"
  tectonic_cluster_name = "${var.cluster_name}"
  tectonic_container_linux_channel = "stable"
  tectonic_container_linux_version = "latest"
  tectonic_etcd_count = "0"
  tectonic_etcd_tls_enabled = true
  tectonic_license_path = ""
  tectonic_master_count = "1"
  tectonic_networking = "calico"
  tectonic_pull_secret_path = ""
  tectonic_tls_validity_period = "26280"
  tectonic_vanilla_k8s = true
  tectonic_worker_count = "3"
  tectonic_admin_email = "${var.admin_email}"
  tectonic_admin_password = "${data.external.admin_password.result.value}"
}

Any idea what might have gone wrong? Any help appreciated.

NB: logs sanitised.

pms1969 commented 6 years ago

Adding some additional information:

Thinking that it may have been the size of the etcd nodes, and thinking I'd need a bit more room ont eh workers, I changed the config so that the etcd nodes wher t2.medium, and the workers were t2.2xlarge. I then reapplied. didn't seem to work, so I terminated the master node. Now the master is completely fubar'ed.

It's stuck running 2 containers over and over:

ore@ip-10-102-6-98 ~ $ docker ps -a
CONTAINER ID        IMAGE                                                            COMMAND                   CREATED              STATUS                          PORTS               NA$
ES
84e0ab2a0e52        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   6 seconds ago        Exited (0) 5 seconds ago                            pra
ctical_davinci
f04141809f33        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/detect-master.sh"       7 seconds ago        Exited (0) 5 seconds ago                            xen
odochial_euler
8624c670a694        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   18 seconds ago       Exited (0) 17 seconds ago                           nau
ghty_swartz
ae231ea3c901        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/detect-master.sh"       20 seconds ago       Exited (0) 18 seconds ago                           vig
ilant_minsky
e51e885ab635        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   31 seconds ago       Exited (0) 29 seconds ago                           sle
epy_brattain
b0322b040e24        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/detect-master.sh"       32 seconds ago       Exited (0) 30 seconds ago                           elo
quent_pasteur
700b08b69b90        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   43 seconds ago       Exited (0) 42 seconds ago                           hun
gry_franklin
0fb019b4eed1        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/detect-master.sh"       44 seconds ago       Exited (0) 42 seconds ago                           bli
ssful_lalande
d584d41f958c        quay.io/coreos/awscli:025a357f05242fdad6a81e8a6b520098aa65a600   "/bin/bash -c '\n  ..."   55 seconds ago       Exited (0) 54 seconds ago                           ama
zing_engelbart

there's no output from detect-master, but it is returning true log from the other is

core@ip-10-102-6-98 ~ $ docker logs e51e885ab635
download: s3://mys3bucket/assets.zip to tmp/mys3bucket+assets.zip

So it's downloading the assets, just whatever script is running, isn't getting that message.

pms1969 commented 6 years ago

UPDATE:

The last problem was due to the assets.zip file being corrupted. Not sure how that happened.

I've since recreated the cluster, and the same initial problem persists. hyperkube (running flock) exits with

F0209 12:09:50.976234       5 hooks.go:133] PostStartHook "ca-registration" failed: unable to initialize client CA configmap: timed out waiting for the condition

the other hyperkube continues to error with repeatedly with:

E0209 12:13:42.552673       1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://mydomain.com:443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: EOF

Apologies for the running commentary.

pcockwell commented 6 years ago

Also having this exact issue. Is there any additional information on this?