Azure / ArcEnabledServersGroupPolicy

Guidance and sample code to perform at-scale onboarding of servers to Arc via Group Policy
MIT License
9 stars 15 forks source link

Azure Arc-enabled Servers - DeployGPO.ps1 Script fails with "C:\<path>\DeployGPO.ps1 : Exception calling "ProtectBase64" with "2" argument(s): "Encryption failed." #41

Open EightFortyEight opened 5 months ago

EightFortyEight commented 5 months ago

I am trying to deploy Azure Arc for a client and attempting to enroll machines at scale following the "Connect machines at sale using Group Policy" KB. I have configured all of the prerequisites and gone through the Azure setup portion of the scripts.

I am having issues with the DeployGPO.ps1 script completing. The script gets through the GPO portion successfully, but hangs at the encryption section and eventually fails with "C:\DeployGPO.ps1 : Exception calling "ProtectBase64" with "2" argument(s): "Encryption failed.". The only place I can find mention of this error is in this github comment thread. I have tried all of the solutions in the comment thread, as well as the linked related thread, without success. I have also confirmed all .NET updates are installed and ensured that no dependencies are being blocked by both the corporate firewall and windows firewall.

In my case, the issue seems to be environment related and specific to the "$encryptedSecret = [DpapiNgUtil]::ProtectBase64($descriptor, $ServicePrincipalSecret)" line in the PowerShell script. There are no issues importing the module required which defines how to use DpapiNgUtil. I have isolated this section of the script with the required variables and was able to have it successfully run in a lab environment, so I know it isn't a bug or mistake in the script itself. Using that same, confirmed working, portion of the script anywhere in the customer environment, even fresh Windows Server 2022 VMs that have not been joined to the domain yet, all fail with the same "Encryption failed" error described above. Installing the agent manually on the VMs works but is not feasible with the number of VMs in the environment.

Any help is greatly appreciated. This is in a vCenter environment on Dell hosts if we need to get down to host-level troubleshooting. Thank you in advance!

Borgquite commented 5 months ago

As some points of information:

EightFortyEight commented 5 months ago

As some points of information:

  • Presume you've checked the Windows Event Logs for any additional information on the failure
  • Can you run the output of this tool on your server and reset everything back to defaults in case you have ciphers/hashes changed by your security team?
  • The ProtectBase64 and UnprotectBase64 functions in this script are wrappers for NCryptProtectSecret and NCryptUnprotectSecret - you may want this information when searching for error codes
  • Once someone has tried the suggestions in the linked comments (which you have), StackOverflow indicates that one potential cause for NCryptProtectSecret failing is to check your domain functional level
  • Also StackOverflow - make sure the 'CNG Key Isolation' service is running
  • Also make sure that you're running the DeployGPO.ps1 script in an elevated PowerShell prompt, with Domain Admin privileges?
  • Maybe look into any issues relating to Credential Guard?

Apologies for the late reply, other priorities arose.

Thank you for all of the added information, I will have to give that tool a try and research more into the NCryptProtectSecret and NCryptUnprotectSecret functions. I will also need to look into issues with Credential Guard.

Regarding the other points, I checked the functional domain level for the client and found that it was set to 2012 R2. This has since been raised to 2016 with no effect. I can also confirm that I am running the script from an elevated Powershell window and just confirmed that the CNG Key Isolation is running, so we are all set there. The account I am using is a domain admin.

Once I have time to review the IIS tool, credential guard, and any errors relating to the above functions I will report back. Thank you for the help!

Borgquite commented 5 months ago

@EightFortyEight Great - have you rebooted your domain controllers after updating the domain functional level?

Borgquite commented 5 months ago

@EightFortyEight Also to be sure I'd advise checking your forest functional level too, then rebooting DCs. They are not the same thing. You can check them both under 'Active Directory Domains and Trusts':

You can check the outcome with PowerShell - all the DomainModes should be 'Windows2016Domain'

Get-ADForest | Select-Object Name, ForestMode | Format-List
(Get-ADForest).Domains | Get-ADDomain | Select-Object DnsRoot, DomainMode | Format-List
EightFortyEight commented 5 months ago

@EightFortyEight Also to be sure I'd advise checking your forest functional level too, then rebooting DCs. They are not the same thing. You can check them both under 'Active Directory Domains and Trusts':

  • Domain functional level - select each domain, right click -> 'Raise Domain Functional Level...'
  • Forest functional level - select 'Active Directory Domains and Trusts', right click -> 'Raise Forest Functional Level...' Then for safety, reboot all DCs. See if this helps.

You can check the outcome with PowerShell - all the DomainModes should be 'Windows2016Domain'

Get-ADForest | Select-Object Name, ForestMode | Format-List
(Get-ADForest).Domains | Get-ADDomain | Select-Object DnsRoot, DomainMode | Format-List

Good point, I did not check the forest level and only changed the domain level. I will get that sorted today. And yes I did reboot the domain controllers after the change, as well as confirmed the domain function level afterwards.

Thank you for the follow up! I should have an answer on the forest level change results today.

EDIT: No change after upping the forest level. Both the forest and domain functional levels are set to 2016. I'll look into the other options suggested earlier.

dallinfuell commented 3 months ago

@EightFortyEight Also to be sure I'd advise checking your forest functional level too, then rebooting DCs. They are not the same thing. You can check them both under 'Active Directory Domains and Trusts':

  • Domain functional level - select each domain, right click -> 'Raise Domain Functional Level...'
  • Forest functional level - select 'Active Directory Domains and Trusts', right click -> 'Raise Forest Functional Level...' Then for safety, reboot all DCs. See if this helps.

You can check the outcome with PowerShell - all the DomainModes should be 'Windows2016Domain'

Get-ADForest | Select-Object Name, ForestMode | Format-List
(Get-ADForest).Domains | Get-ADDomain | Select-Object DnsRoot, DomainMode | Format-List

Good point, I did not check the forest level and only changed the domain level. I will get that sorted today. And yes I did reboot the domain controllers after the change, as well as confirmed the domain function level afterwards.

Thank you for the follow up! I should have an answer on the forest level change results today.

EDIT: No change after upping the forest level. Both the forest and domain functional levels are set to 2016. I'll look into the other options suggested earlier.

Hi EightFortyEight,

Any luck on getting this resolved? I'm currently in a very similar situation. I've followed this thread on the topic and have made it to the same point as this last comment and was curious if you were able to get something working. Thanks in advance!

EightFortyEight commented 3 months ago

@EightFortyEight Also to be sure I'd advise checking your forest functional level too, then rebooting DCs. They are not the same thing. You can check them both under 'Active Directory Domains and Trusts':

  • Domain functional level - select each domain, right click -> 'Raise Domain Functional Level...'
  • Forest functional level - select 'Active Directory Domains and Trusts', right click -> 'Raise Forest Functional Level...' Then for safety, reboot all DCs. See if this helps.

You can check the outcome with PowerShell - all the DomainModes should be 'Windows2016Domain'

Get-ADForest | Select-Object Name, ForestMode | Format-List
(Get-ADForest).Domains | Get-ADDomain | Select-Object DnsRoot, DomainMode | Format-List

Good point, I did not check the forest level and only changed the domain level. I will get that sorted today. And yes I did reboot the domain controllers after the change, as well as confirmed the domain function level afterwards. Thank you for the follow up! I should have an answer on the forest level change results today. EDIT: No change after upping the forest level. Both the forest and domain functional levels are set to 2016. I'll look into the other options suggested earlier.

Hi EightFortyEight,

Any luck on getting this resolved? I'm currently in a very similar situation. I've followed this thread on the topic and have made it to the same point as this last comment and was curious if you were able to get something working. Thanks in advance!

Hey! I found a workaround, but unfortunately not a solution to the issues outlined here. I currently have a Microsoft support ticket open and have done a screenshare/provided logs, however I've had no progress there. If they get back with a solution I will report back here.

What I ended up doing as a workaround was using the "Basic script" deployment method to generate the script and then deployed that via GPO. The downside is that it removes the verification section for servers being added to Azure Arc so it's less secure, but it works. Once all of the VMs are added or a solution is found for the issues described here I'll update the GPO.

EightFortyEight commented 3 months ago

So to clarify on that, go to Azure Arc > Azure Arc resources > Machines > Add/Create > Add machine > Add multiple servers, and then on the "Download and run script" section just have "Basic script" selected instead of "Group policy". Then just deploy that as a startup powershell script.

EightFortyEight commented 1 month ago

Just a follow up: I am still working with Microsoft support on the issue.