Multitude of TLS_cert-related errors when doing a fresh deployment

apetresc commented 7 years ago

I'm trying out the module from the new Terraform registry. My configuration is below, with sensitive parts redacted:

module "kubernetes" {
  source = "coreos/kubernetes/aws"

  tectonic_admin_email = "adrian.petrescu@rubikloud.com"
  tectonic_admin_password_hash = "<redacted>"
  tectonic_aws_ssh_key = "rubikloud-master"
  tectonic_base_domain = "rubikloudcorp.com"
  tectonic_cluster_name = "k8test"

  tectonic_vanilla_k8s = true
  tectonic_aws_private_endpoints = false
  tectonic_aws_external_private_zone = "<redacted>"

  tectonic_autoscaling_group_extra_tags = [ .. some tags ..  ]

  tectonic_aws_extra_tags {
    .. some tags ..
  }
}

The terraform apply fails after creating a few dozen resources with the following errors:

Error applying plan:

10 error(s) occurred:

* module.kubernetes.module.kube_certs.tls_private_key.kube_ca: 1 error(s) occurred:

* tls_private_key.kube_ca: unexpected EOF
* module.kubernetes.module.identity_certs.tls_cert_request.identity_server: 1 error(s) occurred:

* tls_cert_request.identity_server: unexpected EOF
* module.kubernetes.module.etcd_certs.tls_private_key.etcd_client: 1 error(s) occurred:

* tls_private_key.etcd_client: unexpected EOF
* module.kubernetes.module.etcd_certs.tls_private_key.etcd_peer: 1 error(s) occurred:

* tls_private_key.etcd_peer: unexpected EOF
* module.kubernetes.module.ingress_certs.tls_cert_request.ingress: 1 error(s) occurred:

* tls_cert_request.ingress: unexpected EOF
* module.kubernetes.module.etcd_certs.tls_cert_request.etcd_server: 1 error(s) occurred:

* tls_cert_request.etcd_server: unexpected EOF
* module.kubernetes.module.etcd_certs.tls_self_signed_cert.etcd_ca: unexpected EOF
* module.kubernetes.module.identity_certs.tls_cert_request.identity_client: connection is shut down
* module.kubernetes.module.kube_certs.tls_cert_request.apiserver: connection is shut down
* module.kubernetes.module.kube_certs.tls_cert_request.kubelet: connection is shut down

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

I tried setting tectonic_etcd_tls_enabled = false but the same thing occurred.

It feels like I'm supposed to be providing some sort of cert for etcd, but the documentation seems to imply that this cert should be created for me by the module itself. I can't find much else to help diagnose.

Let me know if there's any additional information I can provide. Thanks!

robszumski commented 7 years ago

Which version of Terraform were you using with this attempt?

apetresc commented 7 years ago

@robszumski The latest - 0.10.8.

apetresc commented 7 years ago

I just tested again, by the way, to see if something had changed in terraform in the last 20 days that would fix this. Nope, it fails with the exact same error.

squat commented 7 years ago

@apetresc can you please try once more but with a tagged version of the module? I’ve noticed otherwise that the chosen version isn’t always the latest. Please try with the v1.7.5 release and post the result of terraform apply.

apetresc commented 7 years ago

Okay, so I changed the source line to:

source = "git::https://github.com/coreos/terraform-aws-kubernetes.git?ref=1.7.5-tectonic.1"

flushed out .terraform/modules, re-did terraform get, and re-did terraform apply. I got a different error this time, although it still appears to be related to the TLS cert:

Error: Error applying plan:

10 error(s) occurred:

* module.kubernetes.module.bootkube.data.template_file.kubeconfig: data.template_file.kubeconfig: failed to render : 4:11: unknown variable accessed: cluster_name
* module.kubernetes.module.kube_certs.tls_cert_request.apiserver: 1 error(s) occurred:

* tls_cert_request.apiserver: unexpected EOF
* module.kubernetes.module.etcd_certs.tls_private_key.etcd_server: 1 error(s) occurred:

* tls_private_key.etcd_server: unexpected EOF
* module.kubernetes.module.kube_certs.local_file.kubelet_crt: Resource 'tls_locally_signed_cert.kubelet' not found for variable 'tls_locally_signed_cert.kubelet.cert_pem'
* module.kubernetes.module.ingress_certs.tls_locally_signed_cert.ingress: Resource 'tls_cert_request.ingress' not found for variable 'tls_cert_request.ingress.cert_request_pem'
* module.kubernetes.module.kube_certs.tls_locally_signed_cert.kubelet: connection is shut down
* module.kubernetes.module.etcd_certs.tls_locally_signed_cert.etcd_client: connection is shut down
* module.kubernetes.module.ingress_certs.tls_cert_request.ingress: connection is shut down
* module.kubernetes.module.identity_certs.tls_locally_signed_cert.identity_server: connection is shut down
* module.kubernetes.module.identity_certs.tls_locally_signed_cert.identity_client: connection is shut down

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Am I doing something wrong this time? I was still using the exact same config from above.

aknuds1 commented 7 years ago

I'm having the same error if I disable tectonic_aws_private_endpoints, while tectonic_aws_public_endpoints is enabled. Using the latest Tectonic Installer from GitHub and terraform v0.10.8.

s-urbaniak commented 7 years ago

found the root cause of the issue:

https://github.com/coreos/tectonic-installer/blob/fe127b8/platforms/aws/tectonic.tf#L12 and https://github.com/coreos/tectonic-installer/blob/fe127b8/platforms/aws/tectonic.tf#L31 reference module.dns.api_internal_fqdn and module.dns.ingress_internal_fqdn, but these are not generated if tectonic_aws_private_endpoints = false (which disables internal DNS zones).

We need to have ternaries here which reference the external FQDNs if the internal zone is disabled:

diff --git a/platforms/aws/tectonic.tf b/platforms/aws/tectonic.tf
index 2973e07a..239f3ed3 100644
--- a/platforms/aws/tectonic.tf
+++ b/platforms/aws/tectonic.tf
@@ -9,7 +9,7 @@ module "kube_certs" {
   ca_cert_pem        = "${var.tectonic_ca_cert}"
   ca_key_alg         = "${var.tectonic_ca_key_alg}"
   ca_key_pem         = "${var.tectonic_ca_key}"
-  kube_apiserver_url = "https://${module.dns.api_internal_fqdn}:443"
+  kube_apiserver_url = "https://${var.tectonic_aws_private_endpoints ? module.dns.api_internal_fqdn : module.dns.api_external_fqdn}:443"
   service_cidr       = "${var.tectonic_service_cidr}"
   validity_period    = "${var.tectonic_tls_validity_period}"
 }
@@ -28,7 +28,7 @@ module "etcd_certs" {
 module "ingress_certs" {
   source = "../../modules/tls/ingress/self-signed"

-  base_address    = "${module.dns.ingress_internal_fqdn}"
+  base_address    = "${var.tectonic_aws_private_endpoints ? module.dns.ingress_internal_fqdn : module.dns.ingress_external_fqdn}"
   ca_cert_pem     = "${module.kube_certs.ca_cert_pem}"
   ca_key_alg      = "${module.kube_certs.ca_key_alg}"
   ca_key_pem      = "${module.kube_certs.ca_key_pem}"

I will set up a PR for this one. Thanks for reporting this issue!

squat commented 7 years ago

@apetresc we have just merged a PR for this bug upstream. It was introduced during a recent TLS refactor. I am porting the fix to our 1.7.9 release branch. The fix will be available in an upcoming patch release: 1.7.9+tectonic.2. I will update this issue once that is ready.

apetresc commented 7 years ago

Perfect. Thank you so much :)

squat commented 6 years ago

1.7.9+tectonic.2 is out. Closing this :)

coreos / terraform-aws-kubernetes

Multitude of TLS_cert-related errors when doing a fresh deployment #5