The nested AD-with-PKI-ADStack- keeps failing on DC2

service-workbench-collaborations commented 9 months ago

I tried to Deploy into a new VPC using default CIDRs and IPs in us-east-1. With 3 combinations of AZs, us-east-1a/us-east-1c, or us-east-1a/us-east-1b, or us-east-1a/us-east-1b, I kept getting the same error: DomainController2, CREATE_FAILED, Received FAILURE signal with UniqueId i-################ :(

kordell9151 commented 7 months ago

Have you found a solution to this issue? We are seeing a similar problem. We are trying to deploy in an existing VPC and the First EC2 is created successfully, however when CloudFormation creates the second EC2 it never continues. The EC2 is created successfully, but for some reason CloudFormation doesn't recognize it and it will eventually time-out.

saraiva82 commented 6 months ago

There are a few issues that I found by looking at the cloudwatch logs on my runs. if you look at the error CREATE_FAILED, Received FAILURE signal with UniqueId i-################ it does not tell you much. Neither does the one that caused that which is from the nested cloudformation that that stack spawns. However, if you look into Cloudwatch logs for the microsoft-pki-TwoTierCAStack (the nested cloudformation that is spawned when you run this guy). I dont know if you guys had the same issues I had but this is what I encountered and what I did to workaround it. My first run through some errors because I only added my service account to AWS Delegated Enterprise Certificate Authority Administrators and not the AWS Delegated Administrator groups (both are required). With that done I started getting an error because Computer 'EC2AMAZ-XXXXXX' was successfully joined to the new domain 'mydomain.com', but renaming it to 'ENTCA1' failed with the following error message: The account already This is because the cloudformation only reverses the AWS stuff and not the AD stuff that it does. So, you need to delete both ENTCA1 from the computer's OU and from the DNS records. Once this was done I ran it again and I got the following error

Connecting to remote server ENTCA1 failed with the following error message : The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not available. This is usually returned by a HTTP server that does not support the WS-Management protocol. For more information, see the about_Remote_Troubleshooting Help topic. + CategoryInfo : OpenError: (ENTCA1:String) [], PSRemotingTransportException + FullyQualifiedErrorId : URLNotAvailable,PSSessionStateBroken

Now, this is the main reason that I believe its causing the majority of the issues here and I think that AWS needs to amend the requirements. Howerver before I go over this workaround there is something that needs to be done or this will not work if you got this error and remembering that cloudformation will not revert and windows changes, you will need to:

So, you need to delete both ENTCA1 from the computer's OU and from the DNS records
Two and this I realized when by luck. So, even though the ENTCA1 did not finished configuring failed and was removed, the Root CA cert and key were added to the domain SYSVOL, so it will not matter if you fix the winrm issue, the enterprise CA will keep having issues because it will not be able to become subordinate to the ORCA1 as the cert in Sysvol is from you previous failed run. Therefore you will need to delete it so go to File Explorer \$DomainDNSName\SYSVOL\$DomainDNSName\Policies\PkiRootCA you can delete the PkiRootCA directory as it is a temporary directory made by the powershell module https://github.com/aws-ia/cfn-ps-microsoft-pki/blob/main/scripts/Modules/Module-Pki/Module-Pki.psm1 or you can just go inside and delete the cert and key which is what i did.
Now that we basically restored the AD as close to the baseline required before running cloudformation again; we can fix the issue with WINRM. First thing you need to add the ability to make group policy changes to it as AWS AD Services you will not see Configuration\Policies\Administrative Templates\Windows Components\Windows Remote Management\WinRM Service even though you can see the setting in some of the aws made policies. So you can get the templates from microsoft by going to https://learn.microsoft.com/en-us/troubleshoot/windows-client/group-policy/create-and-manage-central-store finding the correct version of Windows that you need remember that Server 2019 translates to Windows 10 just look for the build that you have currently AWS Directory services are 2019. I grabbed whatever matched the Server that AWS spun up for me to manage the AD which it was build 1809 (but I do not think it matters as long as it is one of the windows 10 versions). install the MSI on your dxxxx-managementInstance take note of where you installed it. Defaults to C:\Program Files (x86)\Microsoft Group Policy and follow these instructions https://docs.aws.amazon.com/whitepapers/latest/access-workspaces-with-access-cards/install-the-group-policy-administrative-template-files-for-the-workspaces-streaming-protocol-wsp.html as it is the same process. all of the ADMX go under the Sysvol \morpheus.rviolet.com\SYSVOL\morpheus.rviolet.com\Policies\PolicyDefinitions and the adml under the language en-US for example will go under the en-US folder in that same location in the sysvol (if the folder exists copy just the adml files in to it if it does not you can just copy the directory.
Now that this is completed go to the group policy under the OU that your computers are located should be under the ou that has your netbios name called winrm and complete the following changes:

Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Remote Management\WinRM Service\Allow CredSSP authentication Set it to Not Configured as it should say enabled
Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Remote Management\WinRM Service\Allow remote server management thought WinRM add and to ipv6 filter (im not sure if this is needed but it should not hurt) you can remove after you complete the cloudformation. I added to ipv4 as well as i was tired of running this cloudformaton template. I reverted to the filters that were there after.
Also just to be sure though I dod not think it necessary as this is an SSL thing, I enabled Computer Configuration\Policies\Administrative Templates\Windows Components\Windows Remote Management\WinRM Service\Turn On Capatibility HTTP Listener as again I ran this thing for two days as i debugged it.

After all that was done I ran it again and everything worked.

Hopefully, this will work for you guys. I highly recommend looking into cloudwatch for the errors as they will be in the /aws/Quick_Start/whatever_you_named_this and as they are most likely related to windows and not cloudformation

aws-ia / cfn-ps-microsoft-pki

The nested AD-with-PKI-ADStack- keeps failing on DC2 #13