Error evaluating cluster_mode when var.cluster_mode_enabled = true

rewt commented 4 years ago

Found a bug? Maybe our Slack Community can help.

Describe the Bug

Module fails to create Clustered Redis elasticache resource with error:

module.subnets.aws_nat_gateway.default[1]: Still creating... [1m50s elapsed]
module.subnets.aws_nat_gateway.default[0]: Still creating... [1m50s elapsed]
module.subnets.aws_nat_gateway.default[0]: Creation complete after 1m57s [id=]
module.subnets.aws_nat_gateway.default[1]: Creation complete after 1m57s [id=]
module.subnets.aws_route.default[1]: Creating...
module.subnets.aws_route.default[0]: Creating...
module.subnets.aws_route.default[0]: Creation complete after 1s [id=]
module.subnets.aws_route.default[1]: Creation complete after 1s [id=]

Error: Either `number_cache_clusters` or `cluster_mode` must be set

  on ../../main.tf line 81, in resource "aws_elasticache_replication_group" "default":
  81: resource "aws_elasticache_replication_group" "default" {

Expected Behavior

Successful creation of Clustered Redis resource when var.cluster_mode_enabled=true

Steps to Reproduce

Steps to reproduce the behavior:

Clone repo
Update cluster_mode_enabled default=true
Update cluster_mode_replicas_per_node_group default=1
Update cluster_mode_num_node_groups default=3

Screenshots

If applicable, add screenshots or logs to help explain your problem.

Environment (please complete the following information):

Mac OS
Terraform v0.12.23
+ provider.aws v2.53.0
+ provider.local v1.4.0
+ provider.null v2.1.2
+ provider.template v2.1.2

Additional Context

plan.txt Add any other context about the problem here.

rewt commented 4 years ago

https://github.com/cloudposse/terraform-aws-elasticache-redis/blob/9904a81caa17bbf6abf7e3b82fdfa0ac7aa1215a/main.tf#L88

The issue is that aws_elasticache_replication_group cannot have both cluster_mode and number_cache_clusters as specified here.

Gowiem commented 4 years ago

@rewt I'm seeing the same. Did you get that figured out?

rewt commented 4 years ago

@rewt I'm seeing the same. Did you get that figured out?

I just used my own code for the time being.

Gowiem commented 4 years ago

Gotcha.

For others that come across this issue, this seems to be a recent change in AWS APIs or the underlying AWS Provider which causes this problem. I attempted to fix this in my own fork here, but I wasn't able to get that to work so I instead swapped over to not using cluster_mode_enabled and increasing my cluster_size which provides enough resilience for my current project.

samsullivan commented 4 years ago

@Gowiem I took a crack at it and it's working for me: https://github.com/cloudposse/terraform-aws-elasticache-redis/pull/65. Using null instead of 0 removed the need for setting depends_on.

Also, it looked like we also needed to get rid of availability_zones (maybe it's worth the extra work to implement preferred_availability_zones?).

Gowiem commented 4 years ago

@samsullivan Aha null > 0 did it. Interesting. Thanks for pointing that out and fixing via PR. Hopefully that helps out others if they require a cluster!

I would assume preferred_availability_zones would be desirable since you would want your redis servers co-located in the same AZs as your application servers. Probably worth opening another bug for that enhancement.

samsullivan commented 4 years ago

@Gowiem I don't think it's entirely necessary, since the underlying resource seems intelligent enough. When doing:

cluster_mode_enabled                 = true
cluster_mode_num_node_groups         = length(local.azs)
cluster_mode_replicas_per_node_group = 1

I got 1 node/AZ when passing 1 subnet/AZ. I think preferred_availability_zones is necessary for more unique setups like most nodes in a specific AZ, etc., and therefore should be out of scope of the bugfix to creating Redis in cluster mode.

Would appreciate getting that PR merged, unsure if @rewt were able to give this a try successfully or not.

cloudposse / terraform-aws-elasticache-redis