alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
61 stars 15 forks source link

SHM Deployment - Configure DC script fails #948

Closed ecemdalgic closed 3 years ago

ecemdalgic commented 3 years ago

:scroll: Description

Setup_SHM_DC.ps1 script is failing due to several runtime exceptions during importing artifacts from blob storage. After the script fails, the resource groups (..._DC,..._ARTIFACTS, etc) is created. However, "C:/Installation" path is not available on DC1 machine.

:strawberry: Desired behaviour

A short description of what the desired behaviour should be in order for this issue to be closed.

:camera: Screenshots

image

:recycle: To reproduce

Using PowerShell 7.1.2 Run "Connect-AzAccount" and sign in to Azure Portal Navigate to "deployment\safe_haven_management_environment\setup" directory Run ".\Setup_SHM_DC.ps1 -shmId "

martintoreilly commented 3 years ago

The first error in the screenshot complains about not being able to convert the string defining the bloc storage URI into a URI, complaining that the hostname could not be parsed. I see there are some extra double quotes around the SHM artifact blob storage account name and container name in the error message. Are these just an artifact of the way the error message is constructed or could these be why the URI cannot be parsed?

JimMadge commented 3 years ago

@edwardchalstrey1 @tomaslaz Did either of you encounter anything like this when you deployed your SHMs?

martintoreilly commented 3 years ago

@ecemdalgic Can I confirm that your SHM deployment went smoothly the first time round and you only got this issue when trying to re-run the deployment? If this came up only on re-running the deployment, can you say what (if any) resources where still deployed from the original SHM deployment?

martintoreilly commented 3 years ago

The error about not being able to find drive C feels separate from the URI parsing issue but it's a while since I've looked at the SHM DC deployment scripts myself.

ecemdalgic commented 3 years ago

@martintoreilly indeed, it went smoothly the first time round. Nothing persisted as i deleted everything and used a different "shmId" so that the resources won't coincides.

ecemdalgic commented 3 years ago

After the exceptions, i added a couple of printouts and even change the way we get $blobUrl (using different styles of concatenation etc.) but didn't help.

ecemdalgic commented 3 years ago

Yesterday I spent some time on the issue and I guess the root cause lies within how parameters were passed (Invoke-RemoteScript) to the Import_Artifacts.ps1 script. Since they’re passed with double quotes (I proved it with write-outs), it causes the issue with the $blobUrl variable and so on. I managed to replace double quotes with empty string inside Import_Artifacts.ps1, which partially solved the problem. I say partially, because there are other function calls with the same parameter passing style and they also throw exceptions. I'm now running the .\Setup_SHM_DC.ps1 from scratch to get the logs - will send them within 30 mins or so.

Here are the logs: part-1_log part-2_log part-3_log

The specific log that prints the passed parameters ishere:

image

edwardchalstrey1 commented 3 years ago

I have also run into this error - see top comment #944

martintoreilly commented 3 years ago

Yesterday I spent some time on the issue and I guess the root cause lies within how parameters were passed (Invoke-RemoteScript) to the Import_Artifacts.ps1 script. Since they’re passed with double quotes (I proved it with write-outs), it causes the issue with the $blobUrl variable and so on. I managed to replace double quotes with empty string inside Import_Artifacts.ps1, which partially solved the problem. I say partially, because there are other function calls with the same parameter passing style and they also throw exceptions. I'm now running the .\Setup_SHM_DC.ps1 from scratch to get the logs - will send them within 30 mins or so.

@ecemdalgic I'm pretty sure the reason we explicitly wrap parameter passed to Invoke-RemoteScript as double quoted strings is that we previously had some issues where string parameter values with spaces in them were causing parameter parsing to fail (see this open Azure powershell issue that suggests wrapping arguments to Invoke-AzureRmVMRunCommand (now renamed Invoke-AzVmRunCommand and called by our Invoke-RemoteScript helper function.

Either this has behaviour has changed in some newer version of powershell, or it seems like we're somehow wrapping our arguments for Import_Artifacts.ps1 in double quotes twice.

martintoreilly commented 3 years ago

@ecemdalgic @edwardchalstrey1 Please could one of you double check the contents of the fields in the @params hashtable in Setup_SHM_DC.ps1 before being passed to Invoke-RemoteScript and their values in the $Parameters input parameter and $params hashtable within Invoke-RemoteScript in Deployments.ps1 to see if they are wrapped in one or two sets of double quotes?

If they are wrapped in a single set of double quotes all the way to storage within the $params hashtable within Invoke-RemoteScript in Deployments.ps1, then my guess would be that either the nested hashtable is causing an issues (i.e. passing @params that contains Parameters as a hashtable entry rather than as the top level hashtable), or Invoke-AzVmRunCommand is now wrapping the values of the Parameters hashtable with double quotes itself.

martintoreilly commented 3 years ago

I'm pretty sure we've deployed multiple SHMs successfully since the Invoke-RemoteScript wrapper function was introduced and that the parameters for the remote call to Import_Artifacts.ps1 were wrapped in double quotes at the time (though can't 100% confirm as we did some moving of files, so I can't follow the history on GitHub cleanly. If anyone has better git-fu for tracking changes to Setup_SHM_DC.ps1 across renames and can pinpoint when we switched to calling Invoke-RemoteScript rather than making a direct call to Invoke-AzVmRunCommand, I can confirm this).

martintoreilly commented 3 years ago

~@edwardchalstrey1 @tomaslaz Do you know what commits you did your initial "join the team" full deployment from (or dates you did these if you deployed from the head of master at the time)?~

[Edit: No need now. I checked the July 2020 pen test deployment - see next comment]

martintoreilly commented 3 years ago

I can confirm that the successful full deployment for the July 2020 penetration test was made from commit 5f3ee4f (see deployment log in PR #754), and that we were invoking ImportArtifacts.ps1 remotely via a call to Invoke-RemoteScript and wrapping the values of parameters in the @params hashtable passed to Invoke-RemoteScript with double quotes (direct link to code snippet in Setup_SHM_DC.ps1 for that commit).

edwardchalstrey1 commented 3 years ago

more recently I deployed SHM (see #854) so taking the first commit and running the diff with the release commit git diff e73a404 76fa6427 deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1 I get at the relevant part of the diff:

$scriptPath = Join-Path $PSScriptRoot ".." "remote" "create_dc" "scripts" "Import_Artifacts.ps1" -Resolve
@@ -180,8 +161,7 @@ $params = @{
     storageContainerName   = "`"$storageContainerName`""
     sasToken               = "`"$artifactSasToken`""
 }
-$result = Invoke-RemoteScript -Shell "PowerShell" -ScriptPath $scriptPath -VMName $config.dc.vmName -ResourceGroupName $config.dc.rg -Parameter $params
-Write-Output $result.Value
+$null = Invoke-RemoteScript -Shell "PowerShell" -ScriptPath $scriptPath -VMName $config.dc.vmName -ResourceGroupName $config.dc.rg -Parameter $params

So this shows that we were using double quotes then too

edwardchalstrey1 commented 3 years ago

I'm now investigating the changes in function Invoke-RemoteScript (working in hackmd):

git diff e73a404 76fa6427 deployment/common/Deployments.psm1

@@ -1194,19 +1147,23 @@ function Invoke-RemoteScript {
         $Script | Out-File -FilePath $tmpScriptFile.FullName
         $ScriptPath = $tmpScriptFile.FullName
     }
-    # Setup the remote command
-    if ($Shell -eq "PowerShell") {
-        $commandId = "RunPowerShellScript"
-    } else {
-        $commandId = "RunShellScript"
-    }
     # Run the remote command
-    if ($Parameter) {
-        $result = Invoke-AzVMRunCommand -Name $VMName -ResourceGroupName $ResourceGroupName -CommandId $commandId -ScriptPath $ScriptPath -Parameter $Parameter
-        $success = $?
-    } else {
-        $result = Invoke-AzVMRunCommand -Name $VMName -ResourceGroupName $ResourceGroupName -CommandId $commandId -ScriptPath $ScriptPath
-        $success = $?
+    $params = @{}
+    if ($Parameter) { $params["Parameter"] = $Parameter }
+    $params["CommandId"] = ($Shell -eq "PowerShell") ? "RunPowerShellScript" : "RunShellScript"
+    try {
+        # Catch failures from running two commands in close proximity and rerun
+        while ($true) {
+            try {
+                $result = Invoke-AzVMRunCommand -Name $VMName -ResourceGroupName $ResourceGroupName -ScriptPath $ScriptPath @params -ErrorAction Stop
+                $success = $?
+                break
+            } catch [Microsoft.Azure.Commands.Compute.Common.ComputeCloudException] {
+               if (-not ($_.Exception.Message -match "Run command extension execution is in progress")) { throw }
+            }
+        }
+    } catch {
+        Add-LogMessage -Level Fatal "Running '$ScriptPath' on remote VM '$VMName' failed." -Exception $_.Exception
     }
     $success = $success -and ($result.Status -eq "Succeeded")
     foreach ($outputStream in $result.Value) {
@@ -1222,12 +1179,12 @@ function Invoke-RemoteScript {
     # Check for success or failure
     if ($success) {
         Add-LogMessage -Level Success "Remote script execution succeeded"
+        if (-not $SuppressOutput) { Write-Host ($result.Value | Out-String) }
     } else {
-        Add-LogMessage -Level Info "Script output:`n$($result | Out-String)"
+        Add-LogMessage -Level Info "Script output:"
+        Write-Host ($result | Out-String)
         Add-LogMessage -Level Fatal "Remote script execution has failed. Please check the output above before re-running this script."
     }
-    # Wait 10s to allow the run command extension to register as completed
-    Start-Sleep 10
     return $result
 }
 Export-ModuleMember -Function Invoke-RemoteScript
edwardchalstrey1 commented 3 years ago

@martintoreilly I can't replicate a splatted hash influencing the number of quotation marks - will take a bit more investigative work to find out exactly where the quotation marks are being introduced

martintoreilly commented 3 years ago

I added some comments to your example code in the HackMD. Not 100% sure whether the keyvals variable is being created and/or passed to the print function as a hashtable.

martintoreilly commented 3 years ago

@edwardchalstrey1 I've tried the tweaks I suggested and get the same outcome as you.

Code

function print-params{
  param(
      [Parameter(Mandatory = $true)]
      $parameters
  )
    Write-Output $parameters
    Write-Output "---"
}

$keyvals = @{
 bob = "`"quoted string`""
 mary = "nonquoted string"
}

Write-Output "Hashtable passed as explicit parameter value"
print-params -parameters $keyvals

Write-Output "Hashtable set as field of parent hashtable on construction and passed implicitly by blatting parent hashtable"
$params1 = @{
    parameters = $keyvals
}
print-params @params1

Write-Output "Hashtable set as field of parent hashtable after construction and passed implicitly by blatting parent hashtable"
$params2 = @{}
$params2["parameters"] = $keyvals
print-params @params2

Output

>pwsh -version 
PowerShell 7.1.1

> pwsh test.ps1 
Hashtable passed as explicit parameter value

Name                           Value
----                           -----
bob                            "quoted string"
mary                           nonquoted string
---
Hashtable set as field of parent hashtable on construction and passed implicitly by blatting parent hashtable
bob                            "quoted string"
mary                           nonquoted string
---
Hashtable set as field of parent hashtable after construction and passed implicitly by blatting parent hashtable
bob                            "quoted string"
mary                           nonquoted string
---
>  
martintoreilly commented 3 years ago

@edwardchalstrey1 Next steps I'd look at are:

  1. Have there been any changes to Import_Artifacts.ps1, the file we are calling remotely?
  2. If not, extend the example code you've been using to make print-params be called remotely and call on a remote Windows VM via Invoke-AzVMRunCommand, for the same three scenarios above to see if we get any strings double double quoted during the transfer.
  3. If things still look fine (no double, double quotes), make the remote script more like Import_Artifacts.ps1 (e.g. using the string / path joining approaches it uses) and see if any of these are inserting additional double quotes..
edwardchalstrey1 commented 3 years ago

ok thx @martintoreilly - just note one thing I was trying yesterday that I didn't add to the hackmd - I tried writing some dummy powershell scripts that import from one another and use variations of splatted & not splatted hash tables, to try and simulate the problem but I couldn’t replicate it that way

edwardchalstrey1 commented 3 years ago

Notes from meeting with @ecemdalgic @tomaslaz @JimMadge

JimMadge commented 3 years ago
  • @ecemdalgic has a temp fix for the double quotation problem: just did a string replace within Import_Artifacts.ps1 on the parameters (something like $abc = $abc -replace '"','''') - at least this validates this can be done in any case where there double quote issue occurs

It feels like we know quotation marks in the uri are the problem, as removing them fixes the issue.

The parameters are explicitly wrapped in quotation marks here

https://github.com/alan-turing-institute/data-safe-haven/blob/41205ce10e4e2be78188b0e1e821fae9af7856b6/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L157-L163

On my local machine, in a pwsh session, Invoke-WebRequest ignores quotation marks,

Invoke-WebRequest https://www.bing.com/search?q=how+many+feet+in+a+mile

Invoke-WebRequest https://www.bin"g.com/s"earch?q=how+many+"feet+in+a+"mile

give the same result.

❯ pwsh --version
PowerShell 7.1.2

~This doesn't seem to be the same on the windows VMs.~ It seems that quotes in the URI may not be a problem.

tomaslaz commented 3 years ago

I think that this problem might be related to the new powershell Az module (it has been recently updated). I am trying to check whether downgrading it to 5.0.0 (or another version that is not so recent) would fix it, however I am running into multiple issues. I am trying by adding the following two lines to the top (Line 24) of Import_Artifacts.ps1 script.

Uninstall-Module -Name Az
Install-Module -Name Az -RequiredVersion 5.0.0 -Repository PSGallery -Force -AllowClobber

How can I check which version of the Az module the DC machine is running when a script is executed remotely? When I RDP to the dc vm it doesn't have the Az module installed at all.

ecemdalgic commented 3 years ago

@tomaslaz it probably has it. Follow the steps in the url: https://azurelessons.com/no-match-was-found-for-the-specified-search-criteria-and-module-name-az/

--Edit: Apologies, I skimmed through your post! I encountered some issues while uninstalling Az module from my local whereas it should be uninstalled/downgrade on the vm.

martintoreilly commented 3 years ago

Worth noting that we use the cross-platform Powershell Core for all our deployments but that there is also a separate Windows Powershell. I think since Powershell 7, there is a single cross platform model, but I guess it might be possible we are installing Windows Powershell on the DC VM?

ecemdalgic commented 3 years ago

Unfortunately even after I installed the 5.0.0 manually to the remove VM, the issue persisted.

I think that this problem might be related to the new powershell Az module (it has been recently updated). I am trying to check whether downgrading it to 5.0.0 (or another version that is not so recent) would fix it, however I am running into multiple issues. I am trying by adding the following two lines to the top (Line 24) of Import_Artifacts.ps1 script.

Uninstall-Module -Name Az
Install-Module -Name Az -RequiredVersion 5.0.0 -Repository PSGallery -Force -AllowClobber

How can I check which version of the Az module the DC machine is running when a script is executed remotely? When I RDP to the dc vm it doesn't have the Az module installed at all.

The workaround is to add the following statements to the scripts:

I tried it at the weekend and Setup_SHM_DC.ps1 successfully completed.

martintoreilly commented 3 years ago

From all the investigation to date, it seems that the fix is indeed to stop wrapping the values of parameters to be sent to remote scripts in double quotes. It would be really good to get to the bottom of why this is suddenly an issue, and especially why it worked for the first time Ecem deployed but not the second.

In the meantime, if we can confirm that values with spaces are passed correctly to the remote script, then we're probably safe to make the changes to stop wrapping values.

jemrobinson commented 3 years ago

How can I check which version of the Az module the DC machine is running when a script is executed remotely? When I RDP to the dc vm it doesn't have the Az module installed at all.

The DC doesn't have the Az module because we never run Azure infrastructure tasks from it.

ecemdalgic commented 3 years ago

Please note that user_details_template.csv and CreateUsers.ps1 files haven't been copied across to DC1 VM (just seen that). The rest was fine and I was able to complete configuration of the domain controllers.

jemrobinson commented 3 years ago

Is #934 related to this? This is also caused by an issue when passing paths-with-spaces to another command (in that case Invoke-Expression).

ecemdalgic commented 3 years ago

I completed SHM deployment by adding relevant code pieces (remote scripts called in Setup_SHM_Nexus).

Now proceeding with SRE deployment and it's also an issue with SRE deployment scripts.

image

jemrobinson commented 3 years ago

It looks like all places where our local scripts previously used syntax like:

$params = @{
    remoteDir              = "`"$remoteInstallationDir`""
}

should be updated to use

$params = @{
    remoteDir              = "$remoteInstallationDir"
}

Have you checked whether changing the blocks like this in the scripts that you run locally (ie. not the ones that run on the remote VMs) fixes the issue @ecemdalgic ?

ecemdalgic commented 3 years ago

Hi @jemrobinson! I tried this last week - unfortunately, when I update the parameters as you suggested, the remote script is not even triggered.

tomaslaz commented 3 years ago

I think that I was able to hack around the initial problem by changing:

https://github.com/alan-turing-institute/data-safe-haven/blob/4dc184a6224612fc637b79dbf97c07d886383106/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L157-L163

to

$params = @{
    remoteDir              = $remoteInstallationDir
    pipeSeparatedBlobNames = $($blobNames -join "|")
    storageAccountName     = $config.storage.artifacts.accountName
    storageContainerName   = $storageContainerName
    sasToken               = "`"$artifactSasToken`""
}

note that I had to keep double quotes for the sasToken param. The solution with all double quotation marks removed (as shown below) didn't work.

$params = @{
    remoteDir              = $remoteInstallationDir
    pipeSeparatedBlobNames = $($blobNames -join "|")
    storageAccountName     = $config.storage.artifacts.accountName
    storageContainerName   = $storageContainerName
    sasToken               = $artifactSasToken
}

After being able to execute https://github.com/alan-turing-institute/data-safe-haven/blob/4dc184a6224612fc637b79dbf97c07d886383106/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L164

The script not started to fail at: https://github.com/alan-turing-institute/data-safe-haven/blob/4dc184a6224612fc637b79dbf97c07d886383106/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L195

with the following error:

2021-02-26 11:43:35 [   INFO]: Configuring Active Directory for: DC1-SHM-TESTB...

Value[0]        : 
  Code          : ComponentStatus/StdOut/succeeded
  Level         : Info
  DisplayStatus : Provisioning succeeded
  Message       : aven Server Administrators' creation failed!
Creating AD Sync Service account testblocaladsync...
 [x] Account 'TESTB Local AD Sync Administrator' (testblocaladsync) creation failed!
Creating TESTB Database Servers Manager domain joining account testbdatabasesrvrs...
 [x] Account 'TESTB Database Servers Manager' (testbdatabasesrvrs) creation failed!
Creating TESTB Identity Servers Manager domain joining account testbidentitysrvrs...
 [x] Account 'TESTB Identity Servers Manager' (testbidentitysrvrs) creation failed!
Creating TESTB Linux Servers Manager domain joining account testblinuxsrvrs...
 [x] Account 'TESTB Linux Servers Manager' (testblinuxsrvrs) creation failed!
Creating TESTB RDS Gateway Manager domain joining account testbgatewaysrvrs...
 [x] Account 'TESTB RDS Gateway Manager' (testbgatewaysrvrs) creation failed!
Creating TESTB RDS Session Servers Manager domain joining account testbsessionsrvrs...
 [x] Account 'TESTB RDS Session Servers Manager' (testbsessionsrvrs) creation failed!
Creating TESTB Local AD Sync Administrator domain joining account testblocaladsync...
 [x] Account 'TESTB Local AD Sync Administrator' (testblocaladsync) creation failed!
Adding users to security groups...
Importing GPOs...
 [x] Importing '0AF343A0-248D-4CA5-B19E-5FA46DAE9F9C' to 'All servers - Local Administrators' failed!
 [x] Importing 'EE9EF278-1F3F-461C-9F7A-97F2B82C04B4' to 'All Servers - Windows Update' failed!
 [x] Importing '742211F9-1482-4D06-A8DE-BA66101933EB' to 'All Servers - Windows Services' failed!
 [x] Importing 'B0A14FC3-292E-4A23-B280-9CC172D92FD5' to 'Session Servers - Remote Desktop Control' failed!
Linking GPOs to OUs...
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Domain Controllers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Domain Controllers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'Session Servers - Remote Desktop Control' to 'Secure Research Environment RDS Session Servers' failed!
Setting AAD sync permissions for AD Sync Service account (testblocaladsync)...
 [x] Failed to update ACL permissions for AD Sync Service account 'testblocaladsync'!
Delegating Active Directory registration permissions to service users...
 [x] Failed to delegate permissions on the 'Secure Research Environment Database Servers' container to '"TESTB"\testbdatabasesrvrs'!
 [x] Failed to delegate permissions on the 'Safe Haven Identity Servers' container to '"TESTB"\testbidentitysrvrs'!
 [x] Failed to delegate permissions on the 'Secure Research Environment Linux Servers' container to '"TESTB"\testblinuxsrvrs'!
 [x] Failed to delegate permissions on the 'Secure Research Environment RDS Gateway Servers' container to '"TESTB"\testbgatewaysrvrs'!
 [x] Failed to delegate permissions on the 'Secure Research Environment RDS Session Servers' container to '"TESTB"\testbsessionsrvrs'!
Value[1]        : 
  Code          : ComponentStatus/StdErr/succeeded
  Level         : Info
  DisplayStatus : Provisioning succeeded
  Message       : oryInfo          : ObjectNotFound: (:) [Get-Acl], ItemNotFoundException
    + FullyQualifiedErrorId : GetAcl_PathNotFound_Exception,Microsoft.PowerShell.Commands.GetAclCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:285 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:287 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:290 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:292 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:295 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

New-Object : Exception calling ".ctor" with "6" argument(s): "Value cannot be null.
Parameter name: identity"
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:297 char:21
+ ... AccessRule((New-Object System.DirectoryServices.ActiveDirectoryAccess ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [New-Object], MethodInvocationException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

Set-Acl : Cannot bind argument to parameter 'AclObject' because it is null.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script31.ps1:300 char:20
+ Set-ACL -ACLObject $acl -Path "AD:\${domainOuBase}"
+                    ~~~~
    + CategoryInfo          : InvalidData: (:) [Set-Acl], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.SetAclComma 
   nd

Status          : Succeeded
Capacity        : 0
Count           : 0

2021-02-26 11:45:33 [FAILURE]: [x] Remote script execution has failed. Please check the output above before re-running this script.
Exception: /Users/tlazauskas/git/Turing/data-safe-haven/deployment/common/Logging.psm1:42:17
Line |
  42 |                  throw "$Message"
     |                  ~~~~~~~~~~~~~~~~
     | Remote script execution has failed. Please check the output above before re-running this script.

I was able to partially fix this problem by again remove the double quotation marks for https://github.com/alan-turing-institute/data-safe-haven/blob/4dc184a6224612fc637b79dbf97c07d886383106/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L185-L194

But then the next part `Import GPOs onto domain controllerz started to fail:

Value[0]        : 
  Code          : ComponentStatus/StdOut/succeeded
  Level         : Info
  DisplayStatus : Provisioning succeeded
  Message       : ng TESTB Database Servers Manager domain joining account testbdatabasesrvrs...
 [o] Account 'TESTB Database Servers Manager' (testbdatabasesrvrs) created successfully
Creating TESTB Identity Servers Manager domain joining account testbidentitysrvrs...
 [o] Account 'TESTB Identity Servers Manager' (testbidentitysrvrs) created successfully
Creating TESTB Linux Servers Manager domain joining account testblinuxsrvrs...
 [o] Account 'TESTB Linux Servers Manager' (testblinuxsrvrs) created successfully
Creating TESTB RDS Gateway Manager domain joining account testbgatewaysrvrs...
 [o] Account 'TESTB RDS Gateway Manager' (testbgatewaysrvrs) created successfully
Creating TESTB RDS Session Servers Manager domain joining account testbsessionsrvrs...
 [o] Account 'TESTB RDS Session Servers Manager' (testbsessionsrvrs) created successfully
Creating TESTB Local AD Sync Administrator domain joining account testblocaladsync...
 [o] Account 'TESTB Local AD Sync Administrator' (testblocaladsync) already exists
Adding users to security groups...
 [ ] Adding 'domaintestbadmin' user to group 'SG Safe Haven Server Administrators'
 [o] User 'domaintestbadmin' was added to 'SG Safe Haven Server Administrators'
Importing GPOs...
 [x] Importing '0AF343A0-248D-4CA5-B19E-5FA46DAE9F9C' to 'All servers - Local Administrators' failed!
 [x] Importing 'EE9EF278-1F3F-461C-9F7A-97F2B82C04B4' to 'All Servers - Windows Update' failed!
 [x] Importing '742211F9-1482-4D06-A8DE-BA66101933EB' to 'All Servers - Windows Services' failed!
 [x] Importing 'B0A14FC3-292E-4A23-B280-9CC172D92FD5' to 'Session Servers - Remote Desktop Control' failed!
Linking GPOs to OUs...
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All servers - Local Administrators' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Domain Controllers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All Servers - Windows Services' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Domain Controllers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment Database Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Safe Haven Identity Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment RDS Session Servers' failed!
 [x] Linking GPO 'All Servers - Windows Update' to 'Secure Research Environment RDS Gateway Servers' failed!
 [x] Linking GPO 'Session Servers - Remote Desktop Control' to 'Secure Research Environment RDS Session Servers' failed!
Setting AAD sync permissions for AD Sync Service account (testblocaladsync)...
 [o] Successfully updated ACL permissions for AD Sync Service account 'testblocaladsync'
Delegating Active Directory registration permissions to service users...
 [o] Successfully delegated permissions on the 'Secure Research Environment Database Servers' container to 'TESTB\testbdatabasesrvrs'
 [o] Successfully delegated permissions on the 'Safe Haven Identity Servers' container to 'TESTB\testbidentitysrvrs'
 [o] Successfully delegated permissions on the 'Secure Research Environment Linux Servers' container to 'TESTB\testblinuxsrvrs'
 [o] Successfully delegated permissions on the 'Secure Research Environment RDS Gateway Servers' container to 'TESTB\testbgatewaysrvrs'
 [o] Successfully delegated permissions on the 'Secure Research Environment RDS Session Servers' container to 'TESTB\testbsessionsrvrs'
Value[1]        : 
  Code          : ComponentStatus/StdErr/succeeded
  Level         : Info
  DisplayStatus : Provisioning succeeded
  Message       : ink : Cannot validate argument on parameter 'Guid'. The argument is null or empty. Provide an argument that is 
not null or empty, and then try the command again.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:257 char:34
+         $null = New-GPLink -Guid $gpo.Id -Target "OU=${ouName},${doma ...
+                                  ~~~~~~~
    + CategoryInfo          : InvalidData: (:) [New-GPLink], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.GroupPolicy.Commands.NewGPLinkCommand

Get-GPO : The "All Servers - Windows Update" GPO was not found in the testb.dsgroupdev.co.uk domain.
Parameter name: gpoDisplayName
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:249 char:12
+     $gpo = Get-GPO -Name "$gpoName"
+            ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (Microsoft.GroupPolicy.GPDomain:GPDomain) [Get-GPO], ArgumentException
    + FullyQualifiedErrorId : GpoWithNameNotFound,Microsoft.GroupPolicy.Commands.GetGpoCommand

Get-GPOReport : Cannot validate argument on parameter 'Guid'. The argument is null or empty. Provide an argument that 
is not null or empty, and then try the command again.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:251 char:46
+     [xml]$gpoReportXML = Get-GPOReport -Guid $gpo.Id -ReportType xml
+                                              ~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Get-GPOReport], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.GroupPolicy.Commands.GetGpoReportCommand

New-GPLink : Cannot validate argument on parameter 'Guid'. The argument is null or empty. Provide an argument that is 
not null or empty, and then try the command again.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:257 char:34
+         $null = New-GPLink -Guid $gpo.Id -Target "OU=${ouName},${doma ...
+                                  ~~~~~~~
    + CategoryInfo          : InvalidData: (:) [New-GPLink], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.GroupPolicy.Commands.NewGPLinkCommand

Get-GPO : The "Session Servers - Remote Desktop Control" GPO was not found in the testb.dsgroupdev.co.uk domain.
Parameter name: gpoDisplayName
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:249 char:12
+     $gpo = Get-GPO -Name "$gpoName"
+            ~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (Microsoft.GroupPolicy.GPDomain:GPDomain) [Get-GPO], ArgumentException
    + FullyQualifiedErrorId : GpoWithNameNotFound,Microsoft.GroupPolicy.Commands.GetGpoCommand

Get-GPOReport : Cannot validate argument on parameter 'Guid'. The argument is null or empty. Provide an argument that 
is not null or empty, and then try the command again.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:251 char:46
+     [xml]$gpoReportXML = Get-GPOReport -Guid $gpo.Id -ReportType xml
+                                              ~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Get-GPOReport], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.GroupPolicy.Commands.GetGpoReportCommand

New-GPLink : Cannot validate argument on parameter 'Guid'. The argument is null or empty. Provide an argument that is 
not null or empty, and then try the command again.
At C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.8\Downloads\script33.ps1:257 char:34
+         $null = New-GPLink -Guid $gpo.Id -Target "OU=${ouName},${doma ...
+                                  ~~~~~~~
    + CategoryInfo          : InvalidData: (:) [New-GPLink], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.GroupPolicy.Commands.NewGPLinkCommand

Status          : Succeeded
Capacity        : 0
Count           : 0

2021-02-26 11:56:40 [FAILURE]: [x] Remote script execution has failed. Please check the output above before re-running this script.
Exception: /Users/tlazauskas/git/Turing/data-safe-haven/deployment/common/Logging.psm1:42:17
Line |
  42 |                  throw "$Message"
     |                  ~~~~~~~~~~~~~~~~
     | Remote script execution has failed. Please check the output above before re-running this script.
ecemdalgic commented 3 years ago

@tomaslaz quick question - have you checked the VM on 10.0.0.4 whether it created C:/Installation directory and the remaining configuration?

jemrobinson commented 3 years ago

@tomaslaz perhaps it's also worth trying this:

$params = @{
    remoteDir              = $remoteInstallationDir
    pipeSeparatedBlobNames = $($blobNames -join "|")
    storageAccountName     = $config.storage.artifacts.accountName
    storageContainerName   = $storageContainerName
    sasToken               = "$artifactSasToken"
}

where we are explicitly making sure that the whole of $artifactSasToken is treated as one string (in case it contains whitespace) but we still pass it to the remote VM without wrapping it in quotes.

tomaslaz commented 3 years ago

I was looking into this issue a bit more and I am still puzzled about how this implementation was working in the past. I have recreated the workflow using a few very simply scripts (test.zip) and tried downgrading Az and Az.Compute gradually down to 1.0.0. I could not get the scripts to work.

In the end it all comes down to the Invoke-WebRequest command - if it receives a string with double quotes it throws an error. However, that has been the case until the latest changes in #957:

sasToken with double quotes https://github.com/alan-turing-institute/data-safe-haven/blob/76fa642788f677482dc4f1453e9c75771167309c/deployment/safe_haven_management_environment/setup/Setup_SHM_DC.ps1#L162

sasToken with double quotes used in Invoke-WebRequest https://github.com/alan-turing-institute/data-safe-haven/blob/76fa642788f677482dc4f1453e9c75771167309c/deployment/safe_haven_management_environment/remote/create_dc/scripts/Import_Artifacts.ps1#L47-L48

I even tried using Windows 2016 Server instead of 2019 and it still didn't work. The only thing that I couldn't test is downgrading PowerShell on the Windows machine as I didn't find a way to do it.

martintoreilly commented 3 years ago

@tomaslaz Just double checking that you are doing the downgrades on the DC that the ImportArtifacts.ps1 script runs on, rather than on the machine you are running the parent deployment scrips from?

tomaslaz commented 3 years ago

@martintoreilly that's a very good point - I am actually downgrading on my machine and not on the DC machine. It is because my machine calls Invoke-RemoteScript which in turn calls Invoke-AzVMRunCommand (it comes from Az.Compute module) and then on the DC machine Import_Artifacts.ps1 is executed. Import_Artifacts.ps1 doesn't depend on the Az or Az.Compute modules and basically only calls Invoke-WebRequest, which is a cmdlet from PowerShell itself. I would like to know how to downgrade PowerShell on the DC machine though.

martintoreilly commented 3 years ago

@martintoreilly that's a very good point - I am actually downgrading on my machine and not on the DC machine. It is because my machine calls Invoke-RemoteScript which in turn calls Invoke-AzVMRunCommand (it comes from Az.Compute module) and then on the DC machine Import_Artifacts.ps1 is executed. Import_Artifacts.ps1 doesn't depend on the Az or Az.Compute modules and basically only calls Invoke-WebRequest, which is a cmdlet from PowerShell itself. I would like to know how to downgrade PowerShell on the DC machine though.

Ah. Good point. Everything up to the call to Invoke-AzVMRunCommand is happening locally. Might be worth checking C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\<version>\Downloads and C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\<version>\Status on the DC to see if you can see the parameters logged anywhere (see this link). If so, and if we also have a DC we deployed successfully before, a comparison of these files might help see if the input parameters are escaped differently.

If we do still have access to a DC we deployed successfully before, it might be worth checking (i) if the versions of powershell and the runcommand extension are the same as on the new DC things are breaking on and (ii) if you get consistent behaviour remotely running a test script that just takes some parameter and prints them when you pass parameters with escaped double quotes.

I tried digging into the Azure Powershell source code but got very lost inside the Azure .Net SDK. I did note that they were rewriting the Azure .Net SDK in .Net Core and were following new SDK development guidelines in doing so, so this may potentially be something that's changed deep under the hood?