Open rgoltz opened 1 year ago
If I were to approach my EC2 Security Group Ingress/Egress Policies and Groups in such a way where I define the Security Groups within one Stack, and use another Stack to define all of my policies, my architecture as described would be prone to errors if duplicate rules happen to be inserted and removed.
Notifying the user that a duplicate rule was specified is significantly better than having a silent failure occur which is the current behavior. A "Soft" failure notification in the Create Complete Stack Event is more friendly to other organizations who may of inadvertently added duplicate rules to their EC2 Security Group.
Even if you apply duplicate rules by accident into an EC2 Security Group, you would at least be informed of a duplicate rule so that you can address the policy accordingly. +1 to this PFR
+1 for this PFR. A create and an update should yield the same results in CloudFormation. If I deploy the template to a brand new stack with the duplicate rules, not updating an existing stack, it deploys correctly with all security group rules. When I try to update an existing stack with the same template that just succeeded on a new deploy, it removes the rules with the duplicate descriptions, causing a discrepancy between create & update actions in CF with the same template.
Name of the resource
AWS::EC2::SecurityGroup
Resource name
No response
Description
Starting point:
This issue is a follow up (aka. mitigation) of #612 as an PFR to request an extension of CloudFormation Stack-Events while creating or updating an
AWS::EC2::SecurityGroup
AND having a duplicatedSecurityGroupIngress
(or Egress) defintion in your CloudFormation template. We try to explain and balance the scope of the PFR.Before we start, let's sum up the details and current behavior of CloudFormation. Please also check the part "Addtional backgroup for consideration" to understand the current impact of AWS users run into this issue.
We create a new stack. In a real-life scenario it could be an update of an existing stack and -for sure- there are often much more Ingress-Defintion in such templates.
Results:
CloudFormation stack-event just reporting "created successful". CloudFormation appears to accept it silently:
SecurityGroup is created. A SecurityGroup doesn't allow duplicates (hence here EntryA is missing in the resource).
Issues:
Even you defined (accidentally) three Ingress rules in CFN template, there are only two [= EntryA is missing]. Let's check CloudTail for more information - There are three API-requests from CloudFormation:
While checking the Events, we see that "EntryA" was rejected during AuthorizeSecurityGroupIngress API-call with a return of Client.InvalidPermission.Duplicate:
Scope of the PFR
[ ] CloudFormation could validate the template before send the three API-calls. It's debatable if this in an template validation error to define such duplicate Ingress/Egress rule in YAML or JSON. We are aware that such change is not expectable, since it would change the current behaviour for other AWS users.
[ ] CloudFormation should inform that just removing this duplicate entry from template and update again, could cause bigger issues - see details below.
Addtional backgroup for consideration
You might ask yourself, why we need this?
Stacks with SecurityGroups are an ongoing config-set of rules, which are extended based on different requests. It is very likely that at some point somebody requesting to add an Ingress, even this IP/Port is already part of the template.
If you (or your coworker one year later with with the best of intentions) remove duplicated SecurityGroupIngress/Egress rule, you will run into the details of #612. So, let's follow up here based on the example We check the SecurityGroup resource and see that "EntryA" is not part of the SecurityGroup - Hence we remove this config from CloudFormation template:
We update the existing stack with this new template. In case we now checking the SecurityGroup again (since we doesn't expect any changes we might not even do this check), we see that also "EntryB" is not part of the SecurityGroup anymore! - Hence, we lost the connectivity for 172.24.186.0/23-port8080, which could cause a huge impact to business. Furthermore, our CloudFormation/IaC still show us this rule for 172.24.186.0/23-port8080. Here the Ingress rules of our SecurityGroup:
It seems that just the combination of Protocol/Port/Ip (as a diff on CloudFormation template level) identifies the rule to delete:
Other Details
Once you are not aware of all this bloody details above, it's going to be a long night of debugging! ;-) 🥂