Closed guidola closed 5 months ago
TL;DR - a workaround for updating a Transit Gateway Route Table Association with Cloudformation, using a Custom Resource
For anyone that stumbles into this issue, I've managed to create a workaround for it and I thought that it might help others.
Before we dive into the workaround, it's important to understand how Cloudformation works and why the issue even happens. According to the AWS::EC2::TransitGatewayRouteTableAssociation documentation, updating any of the resource properties requires a replacement of the resource.
In Cloudformation, when a resource needs to be replaced, a new resource is first created and only at the end of the stack update (UPDATE_COMPLETE_CLEANUP_IN_PROGRESS
). As Transit Gateway Attachments can only be associated with one route table, this causes the EC2 service to emit an error (as shown by @guidola 's post)
In order to overcome this issue, there is a need to implement a "destroy-then-create" operation on the resource, which is not supported natively by Cloudformation. The workaround performs this operation by invoking a Custom Resource (Lambda function with Python runtime) which:
In order to avoid errors during the cleanup process, the TGW association resource update/replace policy was set to Retain so Cloudformation will not attempt to delete to "old" association. In addition, to ensure the proper order of execution (custom resource -> TGW route table association) a dependency to the Custom Resource has been set in the AWS::EC2::TransitGatewayRouteTableAssociation resource.
The Cloudformation snippet below implements the above workaround - it has been tested on stack creation and update, but it's advised to test it on your own before applying it into production.
Note that there are some placeholders in the template - replace them with your own resources' references
---
Resources:
DeleteTGWAssociationWhenTableIDChangesRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service: lambda.amazonaws.com
Version: "2012-10-17"
ManagedPolicyArns:
- Fn::Join:
- ""
- - "arn:"
- Ref: AWS::Partition
- ":iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
Policies:
- PolicyDocument:
Statement:
- Action:
- ec2:DescribeTransitGatewayAttachments
- ec2:DescribeTransitGatewayRouteTables
- ec2:DisassociateTransitGatewayRouteTable
Effect: Allow
Resource: "*"
Version: "2012-10-17"
PolicyName: AllowDisassociateTransitGatewayRouteTable
DeleteTGWAssociationWhenTableIDChangesFunction:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile:
"import boto3\nimport urllib3\nimport json\nimport time\ntry:\n import
botostubs\nexcept:\n pass\n\ndef cfn_send(event, context, responseStatus,
responseData, physicalResourceId=None, noEcho=False, reason=None):\n http
= urllib3.PoolManager()\n responseUrl = event['ResponseURL']\n\n print(responseUrl)\n\n
\ responseBody = {}\n responseBody['Status'] = responseStatus\n responseBody['Reason']
= reason if reason else 'See the details in CloudWatch Log Stream: ' + context.log_stream_name\n
\ responseBody['PhysicalResourceId'] = physicalResourceId or context.log_stream_name\n
\ responseBody['StackId'] = event['StackId']\n responseBody['RequestId']
= event['RequestId']\n responseBody['LogicalResourceId'] = event['LogicalResourceId']\n
\ responseBody['NoEcho'] = noEcho\n responseBody['Data'] = responseData\n\n
\ json_responseBody = json.dumps(responseBody)\n\n print(\"Response
body:\\n\" + json_responseBody)\n\n headers = {\n 'content-type'
: '',\n 'content-length' : str(len(json_responseBody))\n }\n\n
\ try:\n \n response = http.request('PUT',responseUrl,body=json_responseBody.encode('utf-8'),headers=headers)\n
\ print(\"Status code: \" + response.reason)\n except Exception
as e:\n print(\"send(..) failed executing requests.put(..): \" +
str(e))\n\ndef lambda_handler(event, context):\n print(event)\n event_props
= event.get('ResourceProperties', {})\n\n try:\n client = boto3.client(\"ec2\")
#type: botostubs.EC2\n tgw_route_table_id = event_props[\"tgw_route_table_id\"]\n
\ tgw_attachment_id = event_props[\"tgw_attachment_id\"]\n\n if
event[\"RequestType\"] in [\"Create\",\"Update\"]:\n if not client.describe_transit_gateway_route_tables(TransitGatewayRouteTableIds=[tgw_route_table_id])[\"TransitGatewayRouteTables\"]:\n
\ raise Exception(f\"Transit Gateway Route Table ID {tgw_route_table_id}
does not exist or cannot be found!\")\n \n association
= client.describe_transit_gateway_attachments(\n TransitGatewayAttachmentIds=[tgw_attachment_id]\n
\ )[\"TransitGatewayAttachments\"][0].get(\"Association\")\n\n
\ if association and association[\"TransitGatewayRouteTableId\"]
!= tgw_route_table_id:\n response = client.disassociate_transit_gateway_route_table(\n
\ TransitGatewayRouteTableId=association[\"TransitGatewayRouteTableId\"],\n
\ TransitGatewayAttachmentId=tgw_attachment_id\n )\n
\ # Wait for attachment to disassociate\n while
client.describe_transit_gateway_attachments(TransitGatewayAttachmentIds=[tgw_attachment_id])[\"TransitGatewayAttachments\"][0].get(\"Association\"):\n
\ pass\n\n\n \n return cfn_send(event, context,
responseStatus=\"SUCCESS\",responseData=None, physicalResourceId=None)\n
\ except Exception as err:\n print(str(err))\n return cfn_send(event,
context, responseStatus=\"FAILED\",responseData=None, reason=str(err))\n\n\n\n\n\n"
Role:
Fn::GetAtt:
- DeleteTGWAssociationWhenTableIDChangesRole
- Arn
Description:
This function provides a workaround for changing the association of a Transit Gateway Attachment's
route table association, due to CloudFormation limitations
FunctionName: tgw-route-table-disassociate-helper
Handler: index.lambda_handler
MemorySize: 128
Runtime: python3.8
Timeout: 10
DependsOn:
- DeleteTGWAssociationWhenTableIDChangesRole
DeleteTGWAssociationWhenTableIDChanges:
Type: AWS::CloudFormation::CustomResource
Properties:
ServiceToken:
Fn::GetAtt:
- DeleteTGWAssociationWhenTableIDChangesFunction
- Arn
tgw_route_table_id:
Ref: <TGWRouteTable resource>
tgw_attachment_id:
Ref: <TGWAttachment resource>
UpdateReplacePolicy: Delete
DeletionPolicy: Delete
TGWRouteTableAssociation:
Type: AWS::EC2::TransitGatewayRouteTableAssociation
Properties:
TransitGatewayAttachmentId:
Ref: <TGWAttachment resource>
TransitGatewayRouteTableId:
Ref: <TGWRouteTable resource>
DependsOn:
- DeleteTGWAssociationWhenTableIDChanges
UpdateReplacePolicy: Retain
DeletionPolicy: Retain
Here's a snippet of the lambda function code in a more readable way - I've taken the send()
function from the cfnresponse
python module and embedded it in the function, as I wanted to have the ability to see the actual error in Cloudformation in case there was any, instead of searching in Cloudwatch logs.
import boto3
import urllib3
import json
import time
try:
import botostubs
except:
pass
def cfn_send(event, context, responseStatus, responseData, physicalResourceId=None, noEcho=False, reason=None):
http = urllib3.PoolManager()
responseUrl = event['ResponseURL']
print(responseUrl)
responseBody = {}
responseBody['Status'] = responseStatus
responseBody['Reason'] = reason if reason else 'See the details in CloudWatch Log Stream: ' + context.log_stream_name
responseBody['PhysicalResourceId'] = physicalResourceId or context.log_stream_name
responseBody['StackId'] = event['StackId']
responseBody['RequestId'] = event['RequestId']
responseBody['LogicalResourceId'] = event['LogicalResourceId']
responseBody['NoEcho'] = noEcho
responseBody['Data'] = responseData
json_responseBody = json.dumps(responseBody)
print("Response body:\n" + json_responseBody)
headers = {
'content-type' : '',
'content-length' : str(len(json_responseBody))
}
try:
response = http.request('PUT',responseUrl,body=json_responseBody.encode('utf-8'),headers=headers)
print("Status code: " + response.reason)
except Exception as e:
print("send(..) failed executing requests.put(..): " + str(e))
def lambda_handler(event, context):
print(event)
event_props = event.get('ResourceProperties', {})
try:
client = boto3.client("ec2") #type: botostubs.EC2
tgw_route_table_id = event_props["tgw_route_table_id"]
tgw_attachment_id = event_props["tgw_attachment_id"]
if event["RequestType"] in ["Create","Update"]:
if not client.describe_transit_gateway_route_tables(TransitGatewayRouteTableIds=[tgw_route_table_id])["TransitGatewayRouteTables"]:
raise Exception(f"Transit Gateway Route Table ID {tgw_route_table_id} does not exist or cannot be found!")
association = client.describe_transit_gateway_attachments(
TransitGatewayAttachmentIds=[tgw_attachment_id]
)["TransitGatewayAttachments"][0].get("Association")
if association and association["TransitGatewayRouteTableId"] != tgw_route_table_id:
response = client.disassociate_transit_gateway_route_table(
TransitGatewayRouteTableId=association["TransitGatewayRouteTableId"],
TransitGatewayAttachmentId=tgw_attachment_id
)
# Wait for attachment to disassociate
while client.describe_transit_gateway_attachments(TransitGatewayAttachmentIds=[tgw_attachment_id])["TransitGatewayAttachments"][0].get("Association"):
pass
return cfn_send(event, context, responseStatus="SUCCESS",responseData=None, physicalResourceId=None)
except Exception as err:
print(str(err))
return cfn_send(event, context, responseStatus="FAILED",responseData=None, reason=str(err))
I hope that someone finds it useful :)
The issue has been resolved. The new resource schema now enforces delete_then_create
when it comes to update/replacement .
This issue has been fixed. The resource now supports update by delete_then_create
. Below is a testing stack that successfully performed resource update.
Closing the issue as the fix has been pushed
2. Scope of request
AWS::EC2::TransitGatewayRouteTableAssociation fails on UPDATE operation when replacement of the current existing association for a newly defined/modified resource is required. i.e. manually changing the route-table a transit gateway attachment is associated to.
3. Expected behavior
The existing TransitGatewayRouteTableAssociation should be removed and replaced by its new definition.
4. Suggest specific test cases
5. Helpful Links to speed up research and evaluation
6. Category (required) - Will help with tagging and be easier to find by other users to +1
Networking & Content (VPC, Route53, API GW,...)
7. Any additional context (optional)