Closed ckamps closed 4 weeks ago
thanks @ckamps for the details on the bug and the steps to replicate the behavior. please allows us to look into it and we will get back on this issue.
It looks like the following change addresses the problem I was encountering. Since the state machine step Enable TGW Attachment Propagations
already handles a ResourceBusyException
exception, this step will be automatically retried when this exception is thrown.
diff --git a/source/lambda/tgw_vpc_attachment/lib/handlers/tgw_vpc_attachment_handler.py b/source/lambda/tgw_vpc_attachment/lib/handlers/tgw_vpc_attachment_handler.py
index dfec102..9813056 100644
--- a/source/lambda/tgw_vpc_attachment/lib/handlers/tgw_vpc_attachment_handler.py
+++ b/source/lambda/tgw_vpc_attachment/lib/handlers/tgw_vpc_attachment_handler.py
@@ -632,9 +632,12 @@ class TransitGatewayVPCAttachments:
# if the return list is empty the API to enable tgw rt propagation will be skipped.
for tgw_route_table_id in propagation_route_tables:
self.logger.info(f"Enabling RT: {tgw_route_table_id} Propagation To Tgw Attachment")
- self.hub_ec2_client.enable_transit_gateway_route_table_propagation(
+ response = self.hub_ec2_client.enable_transit_gateway_route_table_propagation(
tgw_route_table_id,
self.event.get("TransitGatewayAttachmentId"))
+
+ if response.get("Error") == "IncorrectState":
+ raise ResourceBusyException
self._create_tag(
self.event.get("VpcId"),
This change is similar to what was already implemented to force a retry when calling _add_subnet_to_tgw_attachment(self)
and _remove_subnet_from_tgw_attachment(self)
and encountering an IncorrectState
response:
response = self.spoke_ec2_client.remove_subnet_from_tgw_attachment(
self.event.get("TransitGatewayAttachmentId"),
self.event.get('SubnetId'),
)
if response.get("Error") == "IncorrectState":
raise ResourceBusyException
Thank you @ckamps for the details on the fix. We are testing the fix you provided in our environment. We plan to push it in upcoming release.
@ckamps would it be possible for you to try out the changes in the referenced PR in your environment and confirm if it resolves the issue for you.
Describe the bug
When using CloudFormaton to create a VPC with three subnets that include the tag with key
Attach-to-tgw
, the Network Orchestration automation is inconsistent in being able to successfully create a propagation for the TGW attachment. Once in a while, the expected propagation is created while in other cases it is not created.Unlike the propagation, the association appears to be consistently created for the other TGW route table.
The symptom is similar to https://github.com/aws-solutions/network-orchestration-for-aws-transit-gateway/issues/1, but that issue applied to associations and was apparently fixed in 3.0.0.
When the propagation is not created, I see the following error:
"An error occurred (IncorrectState) when calling the EnableTransitGatewayRouteTablePropagation operation: tgw-attach-014... is in invalid state"
In the Lambda log:
It's misleading that the VPC's tags includes the
VPCPropagation
message shown below given that the propagation wasn't successful.To Reproduce
Since the error occurs intermittently, I have not yet determined how to consistently cause the error to occur. Typically, 1 out of every 3 or so attempts to create my stack including its VPCs and subnets will encounter this issue.
Create a stack using a CloudFormation template that creates three subnets in succession including the addition of the tag with key
Attach-to-tgw
.In my environment, I have:
Associate-with
andPropagate-to
tags as shown aboveAttach-to-tgw
key and no valueExpected behavior
STNOStatus-VPCPropagation
when propagation is not successful.Please complete the following information about the solution:
To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0058) - The AWS CloudFormation template. Version v1.0.0".
Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).
Additional context Add any other context about the problem here.