Open jmchuster opened 4 years ago
Oh, weird, sorry! Sounds like a bug. Looks like we presume two availability zones in which to create subnets:
What happens if you specify us-west-2a,us-west-2a
? Is that a reasonable workaround?
Another possible workaround, could you suspend the AZRebalance scaling process? This wouldn't stick around between updates, though.
Otherwise, sounds like a bug to be fixed in the CloudFormation, somehow.
Otherwise, sounds like a bug to be fixed in the CloudFormation, somehow.
Unfortunately it can't 😢 There is no way to vary CloudFormation templates based on the length of input. It's extremely frustrating.
@lox can we list a few and make them conditional on the presence of the az?
This is how I approached it in an experiment @sj26:
https://github.com/buildkite/elastic-ci-stack-for-aws-ecs/blob/2cf72edd0ba9e40335f58b48320ea94d94e3c464/templates/vpc/template.yaml#L9 https://github.com/buildkite/elastic-ci-stack-for-aws-ecs/blob/2cf72edd0ba9e40335f58b48320ea94d94e3c464/templates/vpc/template.yaml#L56-L66
I've been petitioning AWS for Fn::Size
for 3+ years 😢
... oh my
I think I'm going too deep https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-transform.html https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-macros.html#template-macros-lambda-interface
... I think we'll raise a ticket with AWS and ask them how to do it, which might at least add to "petitioning AWS for Fn::Size"
Tried setting the availability zones to us-west-2a,us-west-2a
,
the subnet update failed with
CIDR Block must change if Availability Zone is changed and VPC ID is not changed
Sounds like the underlying issue is the rebalancing across AZs, correct? @sj26 i believe there was some internal discussion around using a lambda to update that setting on the ALB? It’s not exposed via Cloudformation.
It sounds the answer is "use a lambda". AWS said:
Having said that, In order to iterate over a list of "AvailabilityZones", and produce a resource for each element in the list, You can use a Lambda function to create a macro to transform your template based on the list of "AvailabilityZones" provided by your customer (preferred), or build your template dynamically - there are several libraries that can help you do this, like troposphere and the AWS CDK.
Or use a lambda to do as @lox suggests, to suspend AZRebalance.
@jmchuster ah, that sounds like an error from an update. Could you try destroying the stack entirely and re-creating it? I think that workaround is the easiest and most promising.
Another alternative workaround would be to run a custom version of the stack, editing the YAML to remove the second subnet.
If none of the workarounds works then someone industrious might like to create a cloudformation macro for our stack like this article:
It could look at the list of AvailabilityZones and create an appropriate number of appropriately configured Subnet resources.
I'm hesitant to do this because of the added complexity and maintenance burden, but it might be our only option to do it "properly".
Another suggested workaround from AWS:
As a workaround, you could write your template to support a maximum number of Availability zones (2 in this case), and then have each Availability zone passed in as a separate parameter. If a customer wants to use only 1 Availability zone, you could instruct the customer to leave the second one blank. Then, you could use conditional statements to make the second Availability zone optional, so that it's only created if a second Availability zone is passed in.
However, this would result in a pretty ugly template with redundant information. And does not allow for more Availability zones than you have set as the maximum, so if a customer wants to use 3 Availability zones and your template only allows for 2, the 3rd one would not be created.
I don't love it, but I don't hate it either.
I think I prefer the approach I took in that linked CFN template once you mix in things like public vs private subnets and NAT Gateway redundancy.
Yeah I think I do too, @lox.
Not sure when we'll get to this, but we should roll something like that in.
Meanwhile, @jmchuster, did you try re-creating your stack with the same az twice? Does it work?
So i ended up just modifying the template to have a param to choose 1 or 2 AvailabilityZones.
AvailabilityZoneCount:
Type: Number
Description: Optional - Number of subnets that are created from AZs (chosen from !GetAZs or AvailabilityZones parameter if provided)
Default: 2
MinValue: 1
MaxValue: 2
so added the conditions
CreateSubnet0:
!And
- { Condition: CreateVpcResources }
- !Or
- !Equals [ 1, !Ref AvailabilityZoneCount ]
- !Equals [ 2, !Ref AvailabilityZoneCount ]
CreateSubnet1:
!And
- { Condition: CreateVpcResources }
- !Equals [ 2, !Ref AvailabilityZoneCount ]
swapped them out as the Condition
for Subnet[01]
and Subnet[01]Routes
and then for the AgentAutoScaleGroup
VPCZoneIdentifier: !If
- CreateVpcResources
- !If
- CreateSubnet1
- [ !Ref Subnet0, !Ref Subnet1 ]
- [ !Ref Subnet0 ]
- !Ref Subnets
So, presumably, you could add options for 3 or 4 AZs as well and keep on nesting !Ifs following the same pattern, up to AWS's practical limit of 5.
I had looked into maybe setting the SuspendProcesses
based on a param, but it looks like AWS's intention is that it be used only as a temporary state, rather than as a property of the auto-scaling group, seeing as they only provide it under UpdatePolicy as something you set and then automatically unset as part of an update. I'm sure there's an argument to be made that this thinking makes sense for Terminate
but less so for AZRebalance
.
Supporting different AZ (and public/private) configurations is still something we're keen to support at some point. In the interim, the new v5.0.0 release disables AZRebalancing which should address at least one of the reasons why single AZ deployments are sometimes desirable.
Another note here: I believe it might be possible to run in a single AZ (or 3+) by creating the VPC and subnets manually (or via terraform, or your preferred AWS tools), then passing the subnet(s) IDs into the elastic stack via the Subnets
stack parameter.
I found an AWS VPC template that takes the same approach as https://github.com/buildkite/elastic-ci-stack-for-aws/issues/700#issuecomment-679705989 and exposes a NumberOfAZs
parameter along with a plethora of conditions.
Weird as that approach is, I’ve warmed to it as a pragmatic answer to the problem that doesn’t involve using Lambdas to rewrite the template live 🙈
I reckon support for 2, 3, and 4 AZs to bring parity with that Quick Start would be good.
A nice customer facing benefit in supporting more AZs is to broaden the number of spot pools that are available to the Auto Scaling group. More pools, better chance of spot allocation, less chance of spot interruption.
If i try to set the value for
AvailabilityZones
to a single zone, sayus-west-2a
, it fails withI only need a single availability zone here, because the auto-scale group will try rebalancing instances across zones, killing my long-running jobs (24 hours +).