coreos / tectonic-installer

Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more
Apache License 2.0
600 stars 267 forks source link

Terraform module cannot deploy into existing Terraform-managed AWS VPC #1966

Open kerin opened 6 years ago

kerin commented 6 years ago

What keywords did you search in tectonic-installer issues before filing this one?

terraform existing vpc value of 'count' cannot be computed

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

What happened?

When attempting to pass in subnet IDs from an existing VPC terraform module's output into the Tectonic module's tectonic_aws_external_worker_subnet_ids and tectonic_aws_external_master_subnet_ids variables, terraform plan fails with:

* module.kubernetes.module.vpc.data.aws_subnet.external_worker: data.aws_subnet.external_worker: value of 'count' cannot be computed
* module.kubernetes.module.vpc.data.aws_subnet.external_master: data.aws_subnet.external_master: value of 'count' cannot be computed

What you expected to happen?

Tectonic terraform resources should successfully consume computed subnet IDs from other Terraform modules.

How to reproduce it (as minimally and precisely as possible)?

In the module {} invocation, reference output from another terraform module:

module 'kubernetes' {
  source = "coreos/kubernetes/aws"
  tectonic_aws_external_master_subnet_ids = "${module.aws_vpc.private_subnet_ids}"
  tectonic_aws_external_worker_subnet_ids = "${module.aws_vpc.private_subnet_ids}"
  ...
}

# modules/aws_vpc/outputs.tf:

output "private_subnet_ids" {
    value = ["${aws_subnet.private.*.id}"]
}

Anything else we need to know?

This seems to be related to these existing upstream Terraform issues (computed values from interpolation functions such as length() not handled in count attributes): hashicorp/terraform#12570 hashicorp/terraform#10462

A possible fix for this is to have separate variables for number of subnets in tectonic-installer/modules/aws/vpc/existing-vpc.tf instead of using length():

data "aws_subnet" "external_worker" {
  count = "${var.external_vpc_id == "" ? 0 : var.num_external_worker_subnets}"
  id    = "${var.external_worker_subnets[count.index]}"
}

data "aws_subnet" "external_master" {
  count = "${var.external_vpc_id == "" ? 0 : var.num_external_master_subnets}"
  id    = "${var.external_master_subnets[count.index]}"
}
kerin commented 6 years ago

Playing around with this some more, it seems like there's problems with count attributes and existing VPCs all over the place:



* module.kubernetes.module.vpc.aws_subnet.master_subnet: aws_subnet.master_subnet: value of 'count' cannot be computed
* module.kubernetes.module.vpc.aws_subnet.worker_subnet: aws_subnet.worker_subnet: value of 'count' cannot be computed
* module.kubernetes.module.vpc.aws_route_table.default: aws_route_table.default: value of 'count' cannot be computed
* module.kubernetes.module.vpc.aws_route_table.private_routes: aws_route_table.private_routes: value of 'count' cannot be computed
* module.kubernetes.module.vpc.aws_internet_gateway.igw: aws_internet_gateway.igw: value of 'count' cannot be computed```
kerin commented 6 years ago

And yet more info - even with literal values the same error is raised:

module "kubernetes" {
  source = "coreos/kubernetes/aws"
  tectonic_aws_external_master_subnet_ids = ["subnet-fe5727b7", "subnet-f7c471ac", "subnet-07a1c860"]
  tectonic_aws_external_worker_subnet_ids = ["subnet-fe5727b7", "subnet-f7c471ac", "subnet-07a1c860"]
  ...
}

$ terraform plan
...
* module.kubernetes.module.vpc.data.aws_subnet.external_master: data.aws_subnet.external_master: value of 'count' cannot be computed
* module.kubernetes.module.vpc.data.aws_subnet.external_worker: data.aws_subnet.external_worker: value of 'count' cannot be computed
thetaris-slack commented 6 years ago

Hitting the same.

me@my-MBP ~/d/s/t/p/aws> terraform plan -var-file=~/devel/sre/tectonic/tectonic-installer/darwin/clusters/jarvice_2017-09-12_02-10-55/terraform.tfvars -target module.workers ~/devel/sre/tectonic/tectonic-installer/darwin/clusters/jarvice_2017-09-12_02-10-55/platforms/aws
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.aws_ami.coreos_ami: Refreshing state...
Error running plan: 2 error(s) occurred:

* module.vpc.aws_subnet.master_subnet: aws_subnet.master_subnet: value of 'count' cannot be computed
* module.vpc.aws_subnet.worker_subnet: aws_subnet.worker_subnet: value of 'count' cannot be computed
sudhaponnaganti commented 6 years ago

@alexsomesan - Can you review this issue as well.

alexsomesan commented 6 years ago

@kerin @thetaris-slack Please post your tfvars files that produce this behaviour. By all means, please redact away and sensitive values out of them.

sudhaponnaganti commented 6 years ago

@kerin @thetaris-slack - Any update on the tfvars that @alexsomesan asked. Looks like there are no such issues reported in later RCs. Pl confirm that you are not seeing these anymore. Will close this. We are getting close to release 1.7.5 release so want to confirm

sudhaponnaganti commented 6 years ago

@kerin @thetaris-slack - We are getting close to closing validation cycle to prepare for RC. Can you pl review and report back to take action. Otherwise we will miss this for tectonic 1.7.5 release

kerin commented 6 years ago

@sudhaponnaganti I'm still seeing this issue I'm afraid. My module instantiation looks like this, against RC5:

module "kubernetes" {
  source = "github.com/coreos/terraform-aws-kubernetes?ref=3f2d460"

  tectonic_admin_email = "test@example.com"
  tectonic_admin_password_hash = "$2a$10$e.3uNYwNtckQPvgAGji1i.cf0prDAQWtiOlScXi1teW0/yuWZiKV6"
  tectonic_aws_etcd_ec2_type = "t2.medium"
  tectonic_aws_etcd_root_volume_iops = "100"
  tectonic_aws_etcd_root_volume_size = "30"
  tectonic_aws_etcd_root_volume_type = "gp2"
  tectonic_aws_external_master_subnet_ids = ["${module.aws_vpc.private_subnet_ids}"]
  tectonic_aws_external_vpc_id = "${module.aws_vpc.vpc_id}"
  tectonic_aws_external_worker_subnet_ids = ["${module.aws_vpc.private_subnet_ids}"]
  tectonic_aws_master_custom_subnets = "${module.aws_vpc.private_subnet_cidrs}"
  tectonic_aws_master_ec2_type = "t2.medium"
  tectonic_aws_master_root_volume_iops = "100"
  tectonic_aws_master_root_volume_size = "30"
  tectonic_aws_master_root_volume_type = "gp2"
  tectonic_aws_region = "eu-west-1"
  tectonic_aws_ssh_key = "${module.aws_ec2.ssh_key_name}"
  tectonic_aws_vpc_cidr_block = "10.0.0.0/16"
  tectonic_aws_worker_custom_subnets = "${module.aws_vpc.private_subnet_cidrs}"
  tectonic_aws_worker_ec2_type = "t2.medium"
  tectonic_aws_worker_root_volume_iops = "100"
  tectonic_aws_worker_root_volume_size = "30"
  tectonic_aws_worker_root_volume_type = "gp2"
  tectonic_base_domain = "mojanalytics.xyz"
  tectonic_cluster_name = "kerin"
  tectonic_etcd_count = "0"
  tectonic_experimental = false
  tectonic_license_path = ""
  tectonic_master_count = "1"
  tectonic_pull_secret_path = ""
  tectonic_vanilla_k8s = true
  tectonic_worker_count = "3"
}

The module outputs referenced above are defined as:

output "vpc_id" {
    value = "${aws_vpc.main.id}"
}

output "private_subnet_ids" {
    value = ["${aws_subnet.private.*.id}"]
}

output "private_subnet_cidrs" {
    value = "${zipmap(aws_subnet.private.*.id, aws_subnet.private.*.cidr_block)}"
}

output "ssh_key_name" {
  value = "${aws_key_pair.default_instance_key.key_name}"
}

On terraform plan this results in:

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.terraform_remote_state.base: Refreshing state...
data.ignition_systemd_unit.locksmithd: Refreshing state...
data.ignition_systemd_unit.etcd_unzip_tls: Refreshing state...
data.ignition_systemd_unit.locksmithd: Refreshing state...
data.ignition_file.detect_master: Refreshing state...
data.template_file.bootkube_service: Refreshing state...
data.template_file.bootkube_sh: Refreshing state...
data.template_file.tectonic_path: Refreshing state...
data.template_file.docker_dropin: Refreshing state...
data.template_file.bootkube_path_unit: Refreshing state...
data.template_file.tx_off: Refreshing state...
data.template_file.azure_udev_rules: Refreshing state...
data.template_file.max_user_watches: Refreshing state...
data.template_file.s3_puller: Refreshing state...
data.template_file.max_user_watches: Refreshing state...
data.template_file.installer_kubelet_env: Refreshing state...
data.template_file.tx_off: Refreshing state...
data.template_file.s3_puller: Refreshing state...
data.template_file.azure_udev_rules: Refreshing state...
data.template_file.installer_kubelet_env: Refreshing state...
data.template_file.docker_dropin: Refreshing state...
data.template_file.tectonic_service: Refreshing state...
data.template_file.tectonic: Refreshing state...
data.template_file.tectonic_rkt: Refreshing state...
data.template_file.flannel: Refreshing state...
data.ignition_systemd_unit.bootkube_service: Refreshing state...
data.ignition_systemd_unit.docker_dropin: Refreshing state...
data.ignition_systemd_unit.tx_off: Refreshing state...
data.ignition_file.max_user_watches: Refreshing state...
data.ignition_file.azure_udev_rules: Refreshing state...
data.ignition_systemd_unit.bootkube_path_unit: Refreshing state...
data.ignition_systemd_unit.tectonic_path: Refreshing state...
data.ignition_file.s3_puller: Refreshing state...
data.ignition_file.max_user_watches: Refreshing state...
data.ignition_file.installer_kubelet_env: Refreshing state...
data.ignition_systemd_unit.tx_off: Refreshing state...
data.ignition_file.s3_puller: Refreshing state...
data.ignition_file.installer_kubelet_env: Refreshing state...
data.ignition_systemd_unit.tectonic_service: Refreshing state...
data.ignition_file.azure_udev_rules: Refreshing state...
data.ignition_systemd_unit.docker_dropin: Refreshing state...
data.aws_caller_identity.current: Refreshing state...
data.aws_region.current: Refreshing state...
data.aws_route53_zone.tectonic_ext: Refreshing state...
data.aws_availability_zones.azs: Refreshing state...
data.aws_ami.ubuntu_xenial: Refreshing state...
data.aws_iam_policy_document.kubernetes_masters_aws_iam_role_policy: Refreshing state...
data.aws_iam_policy_document.kubernetes_assume_role_policy: Refreshing state...
data.aws_iam_policy_document.kubernetes_nodes_aws_iam_role_policy: Refreshing state...
data.ignition_systemd_unit.locksmithd[0]: Refreshing state...
data.ignition_systemd_unit.locksmithd[2]: Refreshing state...
data.ignition_systemd_unit.locksmithd[1]: Refreshing state...
data.ignition_file.node_hostname[1]: Refreshing state...
data.ignition_file.node_hostname[0]: Refreshing state...
data.ignition_file.node_hostname[2]: Refreshing state...
data.ignition_systemd_unit.etcd3[0]: Refreshing state...
data.ignition_systemd_unit.etcd3[2]: Refreshing state...
data.ignition_systemd_unit.etcd3[1]: Refreshing state...
data.aws_ami.coreos_ami: Refreshing state...
data.aws_availability_zones.azs: Refreshing state...
data.aws_ami.coreos_ami: Refreshing state...
data.aws_ami.coreos_ami: Refreshing state...
Error refreshing state: 2 error(s) occurred:

* module.kubernetes.module.vpc.data.aws_subnet.external_master: data.aws_subnet.external_master: value of 'count' cannot be computed
* module.kubernetes.module.vpc.data.aws_subnet.external_worker: data.aws_subnet.external_worker: value of 'count' cannot be computed

(obviously I can't provide the literal values from my module outputs, as this error prevents those resources from being created in the first place)

sudhaponnaganti commented 6 years ago

Thanks @kerin for the input

wethinkagile commented 6 years ago

Having the same. I tried th workaround from @kerin without success. Any quick fix would be highly appreciated, as I can't scale up.

terraform plan -var-file ../../build/jarvice_2017-09-12_02-10-55/terraform.tfvars  -target module.workers platforms/aws
1 error(s) occurred:

* module root:
  module vpc.root: 2 error(s) occurred:

* resource 'data.aws_subnet.external_master' count: unknown variable referenced: 'num_external_master_subnets'. define it with 'variable' blocks
* resource 'data.aws_subnet.external_worker' count: unknown variable referenced: 'num_external_worker_subnets'. define it with 'variable' blocks
wethinkagile commented 6 years ago

tfvars:

cat ../../build/jarvice_2017-09-12_02-10-55/terraform.tfvars
{
  "tectonic_admin_email": "me@me.de",
  "tectonic_admin_password_hash": "supersecretmegagood",
  "tectonic_aws_etcd_ec2_type": "m3.large",
  "tectonic_aws_etcd_root_volume_size": 32,
  "tectonic_aws_etcd_root_volume_type": "gp2",
  "tectonic_aws_extra_tags": {
    "Name": "jarvice"
  },
  "tectonic_aws_master_custom_subnets": {
    "eu-west-1a": "10.0.0.0/20",
    "eu-west-1b": "10.0.16.0/20",
    "eu-west-1c": "10.0.32.0/20"
  },
  "tectonic_aws_master_ec2_type": "m3.large",
  "tectonic_aws_master_root_volume_size": 32,
  "tectonic_aws_master_root_volume_type": "gp2",
  "tectonic_aws_region": "eu-west-1",
  "tectonic_aws_ssh_key": "tectonic",
  "tectonic_aws_vpc_cidr_block": "10.0.0.0/16",
  "tectonic_aws_worker_custom_subnets": {
    "eu-west-1a": "10.0.48.0/20",
    "eu-west-1b": "10.0.64.0/20",
    "eu-west-1c": "10.0.80.0/20"
  },
  "tectonic_aws_worker_ec2_type": "m3.large",
  "tectonic_aws_worker_root_volume_size": 32,
  "tectonic_aws_worker_root_volume_type": "gp2",
  "tectonic_base_domain": "mydomain.com",
  "tectonic_cluster_cidr": "10.2.0.0/16",
  "tectonic_cluster_name": "jarvice",
  "tectonic_dns_name": "jarvice",
  "tectonic_etcd_count": 1,
  "tectonic_experimental": false,
  "tectonic_kube_apiserver_service_ip": "10.3.0.1",
  "tectonic_kube_dns_service_ip": "10.3.0.10",
  "tectonic_kube_etcd_service_ip": "10.3.0.15",
  "tectonic_license_path": "./license.txt",
  "tectonic_master_count": 3,
  "tectonic_pull_secret_path": "./pull_secret.json",
  "tectonic_service_cidr": "10.3.0.0/16",
  "tectonic_worker_count": 1
}

I want to scale up to 3 workers.

wethinkagile commented 6 years ago
~/d/s/t/tectonic> terraform plan -var-file ../../build/jarvice_2017-09-12_02-10-55/terraform.tfvars  -target module.workers platforms/aws
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.template_file.s3_puller: Refreshing state...
data.template_file.docker_dropin: Refreshing state...
data.ignition_systemd_unit.locksmithd: Refreshing state...
data.template_file.max_user_watches: Refreshing state...
data.template_file.installer_kubelet_env: Refreshing state...
data.ignition_file.s3_puller: Refreshing state...
data.ignition_file.max_user_watches: Refreshing state...
data.ignition_systemd_unit.docker_dropin: Refreshing state...
data.ignition_file.installer_kubelet_env: Refreshing state...
data.aws_ami.coreos_ami: Refreshing state...

------------------------------------------------------------------------
Error running plan: 2 error(s) occurred:

* module.vpc.aws_subnet.worker_subnet: aws_subnet.worker_subnet: value of 'count' cannot be computed
* module.vpc.aws_subnet.master_subnet: aws_subnet.master_subnet: value of 'count' cannot be computed
johnwards commented 6 years ago

Hello, exact same error at my end. Just running a plan as I needed to scale up my cluster. Not done anything other than run the command in the documentation.

ssunagari commented 6 years ago

Also running into this value of 'count' cannot be computed error on a pretty bare tf file (all referenced variables are declared in a separate variables.tf):

resource "aws_route53_zone" "k8s" {
  name = "k8s.${var.environment}.${var.domain}"
}

module "k8s" {
  source = "github.com/coreos/terraform-aws-kubernetes?ref=1.8.4-tectonic.3"

  tectonic_aws_region = "us-west-2"
  tectonic_admin_email = "aws+${var.environment}@${var.domain}"
  tectonic_admin_password = "${var.tectonic_admin_password}"
  tectonic_aws_ssh_key = "${var.ssh_key_name}"
  tectonic_base_domain = "${aws_route53_zone.k8s.name}"
  tectonic_cluster_name = "tectonic-${var.environment}"
  tectonic_vanilla_k8s = true
}

I am getting the following error on a terraform plan:

Error: Error refreshing state: 5 error(s) occurred:

* module.k8s.module.etcd.data.ignition_systemd_unit.locksmithd: data.ignition_systemd_unit.locksmithd: value of 'count' cannot be computed
* module.k8s.module.ignition_masters.data.template_file.initial_advertise_peer_urls: data.template_file.initial_advertise_peer_urls: value of 'count' cannot be computed
* module.k8s.module.etcd.data.ignition_file.node_hostname: data.ignition_file.node_hostname: value of 'count' cannot be computed
* module.k8s.module.ignition_masters.data.template_file.advertise_client_urls: data.template_file.advertise_client_urls: value of 'count' cannot be computed
* module.k8s.module.ignition_masters.data.template_file.etcd_names: data.template_file.etcd_names: value of 'count' cannot be computed

Mostly at random I picked data.ignition_file.node_hostname and tried to trace its provenance:

https://github.com/coreos/tectonic-installer/blob/1.8.4-tectonic.3/modules/aws/etcd/ignition.tf#L17

count = "${length(var.external_endpoints) == 0 ? var.instance_count : 0}"

https://github.com/coreos/terraform-aws-kubernetes/blob/1.8.4-tectonic.3/main.tf#L75-L77

  external_endpoints      = "${compact(var.tectonic_etcd_servers)}"
  extra_tags              = "${var.tectonic_aws_extra_tags}"
  instance_count          = "${length(data.template_file.etcd_hostname_list.*.id)}"

Our tectonic_etcd_servers is the default [], so external_endpoints = 0 and we go to etcd_hostname_list via instance_count:

https://github.com/coreos/terraform-aws-kubernetes/blob/master/tectonic.tf#L2

count = "${var.tectonic_self_hosted_etcd != "" ? 0 : var.tectonic_etcd_count > 0 ? var.tectonic_etcd_count : length(data.aws_availability_zones.azs.names) == 5 ? 5 : 3}"

tectonic_self_hosted_etcd and tectonic_etcd_count are the defaults of "" and 0 so I think this falls through to the final expression length(data.aws_availability_zones.azs.names) == 5 ? 5 : 3

Either way I'm not sure why it thinks it can't compute count.

terraform 0.10.8, though have also tried with 0.11.2 to no avail.

ssunagari commented 6 years ago

self-followup: my issue above went away when I did a -targeted apply of the route53 resource first, then a full plan/apply. Although I have subsequently realized I was confused about how the base domain gets used and am going to re-do our setup without the additional route53_zone anyway.

tomdavidson commented 6 years ago

@ssunagari so what was the actual problem? (btw, the language used around the dns zone has been confusing to me too)

wethinkagile commented 6 years ago

We are still seeing this problem on a fresh default install, even with tectonic-1.8.9 and its terraform v0.10.7 binary. We are not doing any Route53 shenanigans like ssunagari, just a full AWS install and upscale from the vanilla docs with vanilla binaries, no special sauce. Trying anew & failing since 8 months. This issue can be very easily reproduced and we really are in desperate need of a solution.

A solution for us would also be a hardcoded instance count in some file if only somebody would finally shed some light on this ticket, but I don't feel comfortable to tinker with the deep-level logic of AWS Terraforming for Kubernetes.

As it looks like we will likely move away from Tectonic, b/c this bug is a career destroying bug.

ssunagari commented 6 years ago

Apologies, did not see the above comments until now.

  1. I still don't know what the actual problem is, per se. I can only repeat what I said above that doing terraform apply -target aws_route53_zone.k8s by itself, followed by the full terraform apply, succeeded.

  2. To elaborate on the DNS name aspect, the ultimate DNS FQDN of the cluster endpoints/API is going to be formed by concatenating tectonic_cluster_name.tectonic_base_domain. So if I wanted k8s.foo.com it's enough to set the tectonic_cluster_name to k8s and tectonic_base_domain to foo.com (and the tectonic module uses a data source to query for the route53 zone for foo.com). I didn't need a separate aws_route53_zone resource at all.

  3. So in our case the problem went away when we removed the external dependency. Obviously this is not going to be a useful workaround in most cases (if this was even the same root cause to begin with).

So unfortunately I don't have any magic insights to offer -- just a tectonic/k8s n00b fumbling my way through this :-/

wethinkagile commented 6 years ago

@ssunagari I have no idea what you are talking about, we have no problems with route53, nor with base domains or cluster names or any zones. Let me kindly suggest you open a new thread for your issue. Our issue is the the same as the Thread Creator's issue which is the count of the workers can't be upscaled. The line in question is this:

count = "${var.external_vpc_id == "" ? 0 : var.num_external_worker_subnets}

We contacted Tectonic on numerous channels on numerous occasions over the last half year, they only pointed to GitHub or to open our wallet. Since an unscalable Kubernetes Cluster is the very opposite of the very rasion d'être for any microservice-oriented archtiecture in the first place, we now are moving to KOPS and wish the Tectonic Project all the best of luck, it looks like they are going to need plenty of it. Overall the experience with Tectonic (Enterprise) and quay (Enterprise), their support and the quality of their products has been a very very bad one and I can only strongly suggest everybody to think twice before choosing them. Very dissapointed!