aws-cloudformation / cloudformation-coverage-roadmap

The AWS CloudFormation Public Coverage Roadmap
https://aws.amazon.com/cloudformation/
Creative Commons Attribution Share Alike 4.0 International
1.11k stars 54 forks source link

AWS::EC2::SecurityGroup - PFR - provide error + information about duplicate Ingress/Egress rules via CFN stack-events #1532

Open rgoltz opened 1 year ago

rgoltz commented 1 year ago

Name of the resource

AWS::EC2::SecurityGroup

Resource name

No response

Description

Starting point:

This issue is a follow up (aka. mitigation) of #612 as an PFR to request an extension of CloudFormation Stack-Events while creating or updating an AWS::EC2::SecurityGroup AND having a duplicated SecurityGroupIngress (or Egress) defintion in your CloudFormation template. We try to explain and balance the scope of the PFR.

Before we start, let's sum up the details and current behavior of CloudFormation. Please also check the part "Addtional backgroup for consideration" to understand the current impact of AWS users run into this issue.

We create a new stack. In a real-life scenario it could be an update of an existing stack and -for sure- there are often much more Ingress-Defintion in such templates.

AWSTemplateFormatVersion: 2010-09-09
Parameters:
  VpcId:
    Type: String
    Description: 'Enter the VPC ID where the Security Groups will be created'
Resources:
  TestSg:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Test duplicate ingress
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 8080
          ToPort: 8080
          CidrIp: 172.24.186.0/23
          Description: EntryA 
        - IpProtocol: tcp
          FromPort: 8080
          ToPort: 8080
          CidrIp: 172.24.186.0/23
          Description: EntryB (Duplicate) # = it's the same like EntryA
        - IpProtocol: tcp
          FromPort: 9090
          ToPort: 9090
          CidrIp: 192.168.0.1/32
          Description: EntryX

Results:

Issues:

Scope of the PFR

Timestamp Logical ID Status Status reson
2023-02-19 10:25:03 UTC+0100 rogo-secgroup-stack CREATE_COMPLETE -
2023-02-19 10:25:00 UTC+0100 TestSg CREATE_COMPLETE Received response status from AuthorizeSecurityGroupIngress with Client.InvalidPermission.Duplicate: "the specified rule \"peer: 172.24.186.0/23, TCP, from port: 8080, to port: 8080, ALLOW\" already exists". You might defined a duplicate rule. (RequestId: 123-same-request-Id789)
2023-02-19 10:24:59 UTC+0100 TestSg CREATE_IN_PROGRESS Resource creation Initiated
2023-02-19 10:24:54 UTC+0100 TestSg CREATE_IN_PROGRESS -
2023-02-19 10:24:51 UTC+0100 rogo-secgroup-stack CREATE_IN_PROGRESS User Initiated

Addtional backgroup for consideration

You might ask yourself, why we need this?

Stacks with SecurityGroups are an ongoing config-set of rules, which are extended based on different requests. It is very likely that at some point somebody requesting to add an Ingress, even this IP/Port is already part of the template.

If you (or your coworker one year later with with the best of intentions) remove duplicated SecurityGroupIngress/Egress rule, you will run into the details of #612. So, let's follow up here based on the example We check the SecurityGroup resource and see that "EntryA" is not part of the SecurityGroup - Hence we remove this config from CloudFormation template:

AWSTemplateFormatVersion: 2010-09-09
Parameters:
  VpcId:
    Type: String
    Description: 'Enter the VPC ID where the Security Groups will be created'
Resources:
  TestSg:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Test duplicate ingress
      VpcId: !Ref VpcId
      SecurityGroupIngress:
          # -->
          # we removed EntryA since this rule is not part of the SecGroup
          # <--
        - IpProtocol: tcp
          FromPort: 8080
          ToPort: 8080
          CidrIp: 172.24.186.0/23
          Description: EntryB (Duplicate) # = it's the same Ingress-definition like EntryA
          # keep EntryB, since it's part of the SecGroup as shown in Console
        - IpProtocol: tcp
          FromPort: 9090
          ToPort: 9090
          CidrIp: 192.168.0.1/32
          Description: EntryX

We update the existing stack with this new template. In case we now checking the SecurityGroup again (since we doesn't expect any changes we might not even do this check), we see that also "EntryB" is not part of the SecurityGroup anymore! - Hence, we lost the connectivity for 172.24.186.0/23-port8080, which could cause a huge impact to business. Furthermore, our CloudFormation/IaC still show us this rule for 172.24.186.0/23-port8080. Here the Ingress rules of our SecurityGroup: step2-resource_GITHUB

It seems that just the combination of Protocol/Port/Ip (as a diff on CloudFormation template level) identifies the rule to delete: step2-cloudTrail_GITHUB

Other Details

Once you are not aware of all this bloody details above, it's going to be a long night of debugging! ;-) 🥂

ZalmnS commented 1 year ago

If I were to approach my EC2 Security Group Ingress/Egress Policies and Groups in such a way where I define the Security Groups within one Stack, and use another Stack to define all of my policies, my architecture as described would be prone to errors if duplicate rules happen to be inserted and removed.

Notifying the user that a duplicate rule was specified is significantly better than having a silent failure occur which is the current behavior. A "Soft" failure notification in the Create Complete Stack Event is more friendly to other organizations who may of inadvertently added duplicate rules to their EC2 Security Group.

Even if you apply duplicate rules by accident into an EC2 Security Group, you would at least be informed of a duplicate rule so that you can address the policy accordingly. +1 to this PFR

moutkat commented 2 months ago

+1 for this PFR. A create and an update should yield the same results in CloudFormation. If I deploy the template to a brand new stack with the duplicate rules, not updating an existing stack, it deploys correctly with all security group rules. When I try to update an existing stack with the same template that just succeeded on a new deploy, it removes the rules with the duplicate descriptions, causing a discrepancy between create & update actions in CF with the same template.