alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
61 stars 15 forks source link

Removing a deployed Data Safe Haven #2263

Open helendduncan opened 3 weeks ago

helendduncan commented 3 weeks ago

:white_check_mark: Checklist

:computer: System information

:package: Packages

List of packages ``` ``` ```none Paste list of packages here ```

:no_entry_sign: Describe the problem

process fails when trying to tear down a SRE

    pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 running error: update failed                                          
    pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 **failed** 1 error; 4 messages                           
Diagnostics:                                                                                                                                   
  pulumi:pulumi:Stack (data-safe-haven-shm-prod5-sre-sbox123):                                                                                    
    WARNING: All log messages before absl::InitializeLog() is called are written to STDERR                                                                 
    I0000 00:00:1729862729.504606 15028335 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers                    
    I0000 00:00:1729862729.538019 15028335 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers                    
    I0000 00:00:1729862746.760289 15028335 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers                    

    error: update failed                                                                                                                     

  azuread:index:Group (sre_entra_group_admin_group_name):                                                                                         
    error: 1 error occurred:                                                                                                                    
        * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response:                                     
{"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the     
directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in
the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: 9187a085-8eee-4eed-9070-dae41e197200 Correlation ID:                 
e98c6035-ede2-4e27-9c13-b9caf08ce257 Timestamp: 2024-10-25 13:25:26Z","error_codes":[700016],"timestamp":"2024-10-25                                          
13:25:26Z","trace_id":"9187a085-8eee-4eed-9070-dae41e197200","correlation_id":"e98c6035-ede2-4e27-9c13-b9caf08ce257","error_uri":"https://login.microsoftonlin
e.com/error?code=700016"}                                                                                                                                     

  azuread:index:Group (sre_entra_group_privileged_user_group_name):                                                                               
    error: 1 error occurred:                                                                                                                    
        * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response:                                     
{"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the     
directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in
the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: 9187a085-8eee-4eed-9070-dae41e197200 Correlation ID:                 
e98c6035-ede2-4e27-9c13-b9caf08ce257 Timestamp: 2024-10-25 13:25:26Z","error_codes":[700016],"timestamp":"2024-10-25                                          
13:25:26Z","trace_id":"9187a085-8eee-4eed-9070-dae41e197200","correlation_id":"e98c6035-ede2-4e27-9c13-b9caf08ce257","error_uri":"https://login.microsoftonlin
e.com/error?code=700016"}                                                                                                                                     

  azuread:index:Group (sre_entra_group_user_group_name):                                                                                          
    error: 1 error occurred:                                                                                                                    
        * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response:                                     
{"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the     
directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in
the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: 9187a085-8eee-4eed-9070-dae41e197200 Correlation ID:                 
e98c6035-ede2-4e27-9c13-b9caf08ce257 Timestamp: 2024-10-25 13:25:26Z","error_codes":[700016],"timestamp":"2024-10-25                                          
13:25:26Z","trace_id":"9187a085-8eee-4eed-9070-dae41e197200","correlation_id":"e98c6035-ede2-4e27-9c13-b9caf08ce257","error_uri":"https://login.microsoftonlin
e.com/error?code=700016"}                                                                                                                                     

Outputs:                                                                                                                                       
    data          : {                                                                                                                                
        key_vault_name                     : "shmprod5sresbox12secrets"                                                                     
        password_user_database_admin_secret: "password-user-database-admin"                                                                 
    }                                                                                                                                             
    ldap          : {                                                                                                                             
        admin_group_name          : "Data Safe Haven SRE sbox123 Administrators"                                                            
        privileged_user_group_name: "Data Safe Haven SRE sbox123 Privileged Users"                                                          
        user_group_name           : "Data Safe Haven SRE sbox123 Users"                                                                     
    }                                                                                                                                             
    remote_desktop: {                                                                                                                             
        connection_db_name       : "guacamole"                                                                                              
        connection_db_server_name: "shm-prod5-sre-sbox123-db-server-guacamole"                                                              
        container_group_name     : "shm-prod5-sre-sbox123-container-group-remote-desktop"                                                   
        disable_copy             : false                                                                                                    
        disable_paste            : false                                                                                                    
        resource_group_name      : "shm-prod5-sre-sbox123-rg"                                                                               
    }                                                                                                                                             
    workspaces    : {                                                                                                                             
        vm_outputs: [                                                                                                                             
            [0]: {                                                                                                                                
                ip_address: "10.0.2.4"                                                                                                      
                name      : "shm-prod5-sre-sbox123-vm-workspace-01"                                                                         
                sku       : "Standard_D2s_v3"                                                                                               
            }                                                                                                                                     
        ]                                                                                                                                         
    }                                                                                                                                             

Resources:                                                                                                                                     
    243 unchanged                                                                                                                                             

Duration: 52s                                                                                                                                  

Pulumi error:  ~  azuread:index:Group sre_entra_group_user_group_name refreshing (0s) error: 1 error occurred:      
Pulumi error:  ~  azuread:index:Group sre_entra_group_user_group_name **refreshing failed** error: 1 error occurred:      
Pulumi error:  ~  azuread:index:Group sre_entra_group_privileged_user_group_name refreshing (0s) error: 1 error     
occurred:                                                                                                                                                     
Pulumi error:  ~  azuread:index:Group sre_entra_group_privileged_user_group_name **refreshing failed** error: 1 error     
occurred:                                                                                                                                                     
Pulumi error:  ~  azuread:index:Group sre_entra_group_admin_group_name refreshing (0s) error: 1 error occurred:     
Pulumi error:  ~  azuread:index:Group sre_entra_group_admin_group_name **refreshing failed** error: 1 error occurred:     
Pulumi error:     pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 running error: update failed                            
Pulumi error:     error: update failed                                                                                                       
Pulumi error:     error: 1 error occurred:                                                                                                      
Pulumi error:     error: 1 error occurred:                                                                                                      
Pulumi error:     error: 1 error occurred:                                                                                                      
Pulumi error:  stderr:                                                                                                                                        
Pulumi refresh failed.                                                                                                                                        
Tearing down Pulumi infrastructure failed..                                                                                                                   
Could not teardown Secure Research Environment 'sbox123'.    
Full error message ``` $ dsh sre teardown sbox123 You are logged into the Azure CLI as: user: Helen Little (5...d94) tenant: turing.ac.uk (4...84f9) Are these details correct? [y/n] (y): y You are logged into the Microsoft Graph API as: user: entra.admin.helen.little@turingprodtre.onmicrosoft.com (c...6bab) tenant: turingprodtre.onmicrosoft.com (c...783) Are these details correct? [y/n] (y): y warning: A new version of Pulumi is available. To upgrade from version '3.136.1' to '3.137.0', run $ brew update && brew upgrade pulumi or visit https://pulumi.com/docs/install/ for manual instructions and release notes. Loaded stack shm-prod5-sre-sbox123. Refreshing stack shm-prod5-sre-sbox123. Refreshing (shm-prod5-sre-sbox123): @ refreshing..... ~ pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 refreshing (0s) pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 running ~ dsh:sre:EntraComponent sre_entra refreshing (0s) dsh:sre:EntraComponent sre_entra ~ dsh:sre:DnsServerComponent sre_dns_server refreshing (0s) dsh:sre:DnsServerComponent sre_dns_server ~ dsh:sre:NetworkingComponent sre_networking refreshing (0s) azure-native:network:RecordSet sre_gitea_server_gitea_dns_record_set_public_record_set ~ azure-native:storage:Blob sre_desired_state_blob_desired_state_blob_pulumi_vars refreshing (0s) azure-native:storage:Blob sre_desired_state_blob_desired_state_blob_pulumi_vars pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 running error: update failed pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 **failed** 1 error; 5 messages Diagnostics: azuread:index:Group (sre_entra_group_privileged_user_group_name): error: 1 error occurred: * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: f7613b11-e6a6-4577-bffa-630f43137900 Correlation ID: fb2a45d8-303d-42b8-a21b-47c08cc016c6 Timestamp: 2024-10-25 13:01:20Z","error_codes":[700016],"timestamp":"2024-10-25 13:01:20Z","trace_id":"f7613b11-e6a6-4577-bffa-630f43137900","correlation_id":"fb2a45d8-303d-42b8-a21b-47c08cc016c6","error_uri": "https://login.microsoftonline.com/error?code=700016"} azuread:index:Group (sre_entra_group_user_group_name): error: 1 error occurred: * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: f7613b11-e6a6-4577-bffa-630f43137900 Correlation ID: fb2a45d8-303d-42b8-a21b-47c08cc016c6 Timestamp: 2024-10-25 13:01:20Z","error_codes":[700016],"timestamp":"2024-10-25 13:01:20Z","trace_id":"f7613b11-e6a6-4577-bffa-630f43137900","correlation_id":"fb2a45d8-303d-42b8-a21b-47c08cc016c6","error_uri": "https://login.microsoftonline.com/error?code=700016"} azuread:index:Group (sre_entra_group_admin_group_name): error: 1 error occurred: * building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: f7613b11-e6a6-4577-bffa-630f43137900 Correlation ID: fb2a45d8-303d-42b8-a21b-47c08cc016c6 Timestamp: 2024-10-25 13:01:20Z","error_codes":[700016],"timestamp":"2024-10-25 13:01:20Z","trace_id":"f7613b11-e6a6-4577-bffa-630f43137900","correlation_id":"fb2a45d8-303d-42b8-a21b-47c08cc016c6","error_uri": "https://login.microsoftonline.com/error?code=700016"} pulumi:pulumi:Stack (data-safe-haven-shm-prod5-sre-sbox123): WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1729861283.221903 14984277 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers I0000 00:00:1729861283.255415 14984277 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers I0000 00:00:1729861298.587955 14984277 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers I0000 00:00:1729861300.713426 14984277 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers error: update failed Outputs: data : { key_vault_name : "shmprod5sresbox12secrets" password_user_database_admin_secret: "password-user-database-admin" } ldap : { admin_group_name : "Data Safe Haven SRE sbox123 Administrators" privileged_user_group_name: "Data Safe Haven SRE sbox123 Privileged Users" user_group_name : "Data Safe Haven SRE sbox123 Users" } remote_desktop: { connection_db_name : "guacamole" connection_db_server_name: "shm-prod5-sre-sbox123-db-server-guacamole" container_group_name : "shm-prod5-sre-sbox123-container-group-remote-desktop" disable_copy : false disable_paste : false resource_group_name : "shm-prod5-sre-sbox123-rg" } workspaces : { vm_outputs: [ [0]: { ip_address: "10.0.2.4" name : "shm-prod5-sre-sbox123-vm-workspace-01" sku : "Standard_D2s_v3" } ] } Resources: 243 unchanged Duration: 48s Pulumi error: ~ azuread:index:Group sre_entra_group_user_group_name refreshing (0s) error: 1 error occurred: Pulumi error: ~ azuread:index:Group sre_entra_group_user_group_name **refreshing failed** error: 1 error occurred: Pulumi error: ~ azuread:index:Group sre_entra_group_privileged_user_group_name refreshing (0s) error: 1 error occurred: Pulumi error: ~ azuread:index:Group sre_entra_group_privileged_user_group_name **refreshing failed** error: 1 error occurred: Pulumi error: ~ azuread:index:Group sre_entra_group_admin_group_name refreshing (0s) error: 1 error occurred: Pulumi error: ~ azuread:index:Group sre_entra_group_admin_group_name **refreshing failed** error: 1 error occurred: Pulumi error: pulumi:pulumi:Stack data-safe-haven-shm-prod5-sre-sbox123 running error: update failed Pulumi error: error: 1 error occurred: Pulumi error: error: 1 error occurred: Pulumi error: error: 1 error occurred: Pulumi error: error: update failed Pulumi error: stderr: Pulumi refresh failed. Tearing down Pulumi infrastructure failed.. Could not teardown Secure Research Environment 'sbox123'. ```

:steam_locomotive: Workarounds or solutions

jemrobinson commented 3 weeks ago

Looking at:

building client: unable to obtain access token: clientCredentialsToken: received HTTP status 400 with response: {"error":"unauthorized_client","error_description":"AADSTS700016: Application with identifier '57028b5a-8e29-4e17-b9c8-1805d6533019' was not found in the directory 'Production Safe Haven'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. 
helendduncan commented 3 weeks ago
  • Does the Pulumi Service Principal application exist in your Entra tenant?

I think not - from the cli when I log into my entra.admin account

az ad sp list --query "[].{AppId:appId, DisplayName:displayName}" --output table

doesn't include any apps with pulimi in the name?

  • Does it have application ID (client ID) 57028b5a-8e29-4e17-b9c8-1805d6533019?

  • Are its API permissions granted at admin level? (look at "API permissions" and look for "Status" -> "Granted for ...")

  • Does it have a secret?

  • Is the secret correct? (a bit harder to diagnose this one)

jemrobinson commented 3 weeks ago

Can you double-check in the Entra portal (entra.microsoft.com) with the same admin account? It would be under Applications > App Registrations.

helendduncan commented 3 weeks ago

DS Data Safe Haven (prod5) Pulumi Service Principal: 4ad821de-dcfc-477d-8bc0-bf00b5cf947f 22/10/2024 Current

I think I might have deployed against the one which was deleted?

jemrobinson commented 3 weeks ago

Right - that might be true. Can you:

  1. Teardown this SRE
  2. Re-run dsh shm deploy
  3. Redeploy the SRE
helendduncan commented 3 weeks ago

Right - that might be true. Can you:

  1. Teardown this SRE

No - that's what I'm stuck on (I think)

  1. Re-run dsh deploy shm
  2. Redeploy the SRE

Can try steps 2 and 3 through?

helendduncan commented 3 weeks ago

Solution Go to the Subscription > Resources > shmprod5 > containers > pulumi

Find the .pulumi/stacks/data-safe-haven/shm-prod5-sre-srename.json and remove all blocks that reference/use "azuread"

then GOTO the entra admin centre and delete the corresponding groups: "Data Safe Haven SRE srename"... Administrators, Privileged Users, and Users

jemrobinson commented 3 weeks ago

Note this is because the Pulumi Service Principal was deleted and re-created between when this SRE was created and now. @JimMadge: is this something we should do better at warning about? Or, ideally, write something that can update the provider details?

JimMadge commented 3 weeks ago

If I've understood correctly, my feeling is this is an edge case where ad-hoc changes to the TRE have created a state we can't account for.

I'm not keen on trying to write code to account for things like this, it will add maintenance burden and be difficult to test/verify relative to how often it will be useful.

I would rather add to the docs or add guard to stop this kind of thing happening (if the CLI caused this).