Azure / ArcEnabledServersGroupPolicy

Guidance and sample code to perform at-scale onboarding of servers to Arc via Group Policy
MIT License
9 stars 15 forks source link

ServicePrincipal Secret Not Being Decrypted #21

Open TCTX365 opened 1 year ago

TCTX365 commented 1 year ago

We have been testing the 1.06 script version. We are getting errors in the logs that the client secret is wrong/bad and cannot authenticate.

Testing the secret decryption function in the EnableAzureArc.ps1, we get the following error: Could not fetch service principal secret: System.Management.Automation.MethodInvocationException: Exception calling "UnprotectBase64" with "1" argument(s): "The specified data could not be decrypted. " ---> System.Security.Cryptography.CryptographicException: The specified data could not be decrypted.

at DpapiNgUtil.Unprotect(Byte[] protectedData) at DpapiNgUtil.UnprotectBase64(String input) at CallSite.Target(Closure , CallSite , Type , Object ) --- End of inner exception stack trace --- at System.Management.Automation.ExceptionHandlingOps.CheckActionPreference(FunctionContext funcContext, Exception exception) at System.Management.Automation.Interpreter.ActionCallInstruction`2.Run(InterpretedFrame frame) at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame) at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame) False

image

TCTX365 commented 1 year ago

Is anyone monitoring these issue reports?

AustinMack commented 1 year ago

when testing you need to test with the system account. https://docs.microsoft.com/sysinternals/downloads/psexec .\PsExec.exe -i -d -s -accepteula c:\windows\system32\powershell.exe

TCTX365 commented 1 year ago

Tried this afternoon a couple different ways with the psexec tool. Even tried with a brand new SPN/Secret.

image

From azcmagentlog right after install: time="2023-10-17T13:24:40-05:00" level=debug msg="Failed to acquire authorization token from SPN" Application Id=XXXXXXXXXXXXXXXXXXXXXX Error="ClientSecretCredential: unable to resolve an endpoint: server response error: context deadline exceeded"

time="2023-10-17T13:24:40-05:00" level=error msg="Failed to obtain access token"

Same messages as before.

AustinMack commented 1 year ago

Thanks for the update. Looks like it decrypted and now there is a new message. For the message "context deadline exceeded" you may want to review https://learn.microsoft.com/en-us/answers/questions/1008622/failed-to-connect-azure-arc-from-on-premises-linux

TCTX365 commented 1 year ago

We have a Unified case open and are working with an MS engineer to troubleshoot the issue further. We have valid ServicePrincipal credentials, the SPN in configured in the Azure portal with the onboarding role permission. The secret is not expired.

AustinMack commented 1 year ago

Please email AustinM@microsoft.com the case number. thanks

tphan94 commented 1 year ago

Hi all, FWIW we ran into same issue with invalid SPN secret, the GPO ran, az machine connect agent installed, but failed to connect/onboard to azure. We are 101% confident that the secret is valid because onboard work fine using "multiple-server" script with same secret generated from Azure Arc portal. This is Windows environment.

awillows commented 1 year ago

The encryption is done using the following steps (you can see this in the DeployGPO script).

# Encrypting the ServicePrincipalSecret to be decrypted only by the Domain Controllers and the Domain Computers security groups

$DomainComputersSID = "SID=" + $DomainComputersSID
$DomainControllersSID = "SID=" + $DomainControllersSID
$descriptor = @($DomainComputersSID, $DomainControllersSID) -join " OR "

$encryptedSecret = [DpapiNgUtil]::ProtectBase64($descriptor, $ServicePrincipalSecret)

If you try and decrypt on a machine that is not a member of those groups in the domain used during deployment, it will fail with the error "The specified data could not be decrypted."

endreigesund commented 11 months ago

The encryption is done using the following steps (you can see this in the DeployGPO script).

# Encrypting the ServicePrincipalSecret to be decrypted only by the Domain Controllers and the Domain Computers security groups

$DomainComputersSID = "SID=" + $DomainComputersSID
$DomainControllersSID = "SID=" + $DomainControllersSID
$descriptor = @($DomainComputersSID, $DomainControllersSID) -join " OR "

$encryptedSecret = [DpapiNgUtil]::ProtectBase64($descriptor, $ServicePrincipalSecret)

If you try and decrypt on a machine that is not a member of those groups in the domain used during deployment, it will fail with the error "The specified data could not be decrypted."

Thanks @awillows for this. This also solved my problem with Read only domain-controllers not registering. Appears those are not part of the included groups for the encryption, nor the acl's on the deployment folders. Needed to modify the script to include the sid from the group "Read-only Domain Controllers"

1231mahmann commented 11 months ago

In addition to the solution posted by @awillows , this exact same error occurred to us after we ran DeployGPO.ps1 on a Windows Server Core OS (no GUI).

GPO '[MSFT] Azure Arc Servers Onboarding20231220023454' was successfully created in Domain yourdomain.com ...

GPO Setting were successfully imported.
Open GPO Management Console and Check for '[MSFT] Azure Arc Servers Onboarding20231220023454' Group policy
The Group Policy setting could not be imported:
Program 'gpmc.msc' failed to run: Class not registeredAt C:\ArcDeployGPO\DeployGPO.ps1:281 char:5
+     gpmc.msc
+     ~~~~~~~~.

The error it's throwing is a red herring as it's just trying to run gpmc.msc on line 281 in the same try/catch block as Import-GPO, so it doesn't seem like there should be any problem, but after repeatedly failing to add any Arc servers with the same cryptography error as above on each client, running the script over again on a GUI DC fixed it for us.

Brian-Ketterer-BBC commented 10 months ago

I am having the same exact issue and my machines are/were members of the Domain Computers group when the GPO was created and currently are.

Borgquite commented 7 months ago

Exactly the same issue. We have found that this only affects Server 2012 / 2012 R2 servers (the exact servers we need to connect to Azure Arc to deploy ESUs). The following command can be used to test:

PsExec.exe \\<servername> -s C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command "Import-Module '<path to report server share>\AzureArcDeploy\AzureArcDeployment.psm1'; $encryptedSecret = Get-Content '<path to report server share>\AzureArcDeploy\encryptedServicePrincipalSecret'; [DpapiNgUtil]::UnprotectBase64($encryptedSecret)"

We have a whole series of servers, all in the Domain Computers groups. On Server 2019/2022 the above command returns the secret. On Server 2012 / 2012R2:

Exception calling "UnprotectBase64" with "1" argument(s): "The specified data
could not be decrypted.
"
At line:1 char:266
+ ... icePrincipalSecret'; [DpapiNgUtil]::UnprotectBase64($encryptedSecret)
+                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : CryptographicException
Borgquite commented 7 months ago

Might have found a solution - make sure that the following registry keys are set on all affected devices - the domain controller where you ran the script, as well as the servers where you'll be deploying it:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v2.0.50727]
"SystemDefaultTlsVersions"=dword:00000001
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319]
"SystemDefaultTlsVersions"=dword:00000001
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v2.0.50727]
"SystemDefaultTlsVersions"=dword:00000001
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319]
"SystemDefaultTlsVersions"=dword:00000001
"SchUseStrongCrypto"=dword:00000001

You should probably restart the DC, and the relevant servers, before trying again. I think SchUseStrongCrypto may be affecting which cipher DpapiNgUtil uses, possibly resulting in the error where the settings are mismatched.

https://learn.microsoft.com/en-us/dotnet/framework/network-programming/tls#systemdefaulttlsversions