Closed david-midlink closed 1 year ago
Hello @david-midlink, and thank you for opening an issue with the Landing Zone Accelerator team!
I would like to ensure I fully understand the nature of the error you're receiving. Based on your issue description and error messages, it sounds as though a new Transit Gateway route table propagation is being created for an attachment or route table that no longer exists in the environment -- is that accurate?
If so, the underlying configuration that is creating that resource will need to be removed from your configuration files (not just from your environment). If you remove the resources from outside of the LZA configuration files, that effectively causes your environment configuration state to be drifted from the CloudFormation template, which can cause such errors to be encountered. More details about that can be found in Configuration file best practices in our documentation.
In order to resolve the issue, remove the configurations for the propagations from your network-config.yaml
configuration file. The definition for the propagations, based on the error messages, should be under your VPC named EndpointsVpc
. The propagations would be defined under the routeTablePropagations
property in the TransitGatewayAttachmentConfig for the VPC in question.
I hope this information is helpful, and please feel free to respond back if this doesn't solve your issue. Thanks!
Hello @awsclemj,
Indeed, the information is correct, but regrettably, it can't be removed. I initially reconfigured and cleared the entire network from the YAML file. When the pipeline got stuck, I tried to eliminate resources the pipeline couldn't remove on its own.
Unfortunately, all these efforts were unsuccessful. There's an absence of resources, and CloudFormation is unable to address this shortfall. I also attempted to delete the stack related to associations in the network account, hoping that redeployment would resolve the issue.
However, the stack remains stuck due to an "Internet failure" error, and I'm unable to address it. I've raised a case with AWS, which was escalated to their internal CloudFormation team, but there's been no resolution so far.
I'm open to any further suggestions. Additionally, I'm curious if this might be tied to LZA or simply a CloudFormation issue.
Could the discrepancy be due to a case sensitivity issue? The name of the account is "CorpIT," but the logs display it as "CorpIt" with a lowercase 't'. I couldn't find any reference to this in the LZA code.
Hi @david-midlink,
Thank you for the added context. I am unable to comment on what could be causing the Internal Error
since I am unable to see the logs for the underlying service. I think that opening the support case is the correct path since AWS Support can work directly with the service team on the issue.
I do not believe the lowercase 't' is the issue; we use a pascalcase parser in our solution to generate the resource names, so that is likely why you see the deviation in the logs.
I'm curious if you have done a full pipeline run (i.e. manually releasing a change) since removing the resources from the YAML file? LZA shouldn't be trying to create the resources if they are no longer in your configuration. Simply retrying the stage would use the older configuration file since the new config wouldn't have been sourced in the initial Source
stage, so that could potentially be the source of the issue..
Hello @awsclemj
Unfortunately, nothing proved effective, whether running it with or without the configuration (meeting the minimum requirements for validation).
As of now, AWS has not responded to my case regarding the issue. However, for some reason, after not using it for approximately three days, I attempted to delete the stack, and it succeeded. Following that, I redeployed my entire network configuration, and it worked. So, in reality, I have no idea what happened.
Hello @david-midlink, and thanks for following up!
I am glad to hear you are now unblocked in your environment. I will go ahead and close out this issue, but please don't hesitate to open another issue with us or AWS Support should you run into pipeline execution issues going forward.
Thank you for your interest and support of the LZA!
FYI, looks like I am hitting a very similar issue. Does deleting the stack/retrying helps/works?
Describe the bug While executing the pipeline and attempting to add a tgw-attachment to the transit gateway route table during propagation (Network_Associations), the process fails. This failure is due to a resource that no longer exists, causing the operation to become stuck.
Everything functioned smoothly until I established an account named "CorpIT". Everything ran seamlessly up to the propagation stage. When it failed, I attempted to remove it, but the Cloudformation became unresponsive. Despite deleting everything, the system still seems to recognize it for no apparent reason.
To Reproduce Even after deleting all components associated with the network configuration and running it again, the issue persists.
Expected behavior To make propagation function as it did on all other accounts.
Please complete the following information about the solution: