[bug] How to manage state after a timed out deployment

patpicos commented 4 years ago

Describe the bug I have been experimenting with deploying the CAF foundations and modifying some of the tfvars. I enabled the security center option and ran apply. The apply timed out. When I do a re-apply, it says the resource is there and needs to be imported.

The rover command does not expose the import command. Also, the path is so embedded that it becomes difficult to determine how to import the resource into the state. Please advise

To Reproduce Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Configuration (please complete the following information):

OS and version: [e.g. Windows 10 1909] Windows 10 2004
Version of the rover[e.g. 22]
Version of the landing zone[e.g. 11]

cleanup variables
[vscode@92a0aca62a3e caf]$ rover /tf/caf/landingzones/landingzone_caf_foundations import 

  /$$$$$$   /$$$$$$  /$$$$$$$$       /$$$$$$$                                        
 /$$__  $$ /$$__  $$| $$_____/      | $$__  $$                                       
| $$  \__/| $$  \ $$| $$            | $$  \ $$  /$$$$$$  /$$    /$$/$$$$$$   /$$$$$$ 
| $$      | $$$$$$$$| $$$$$         | $$$$$$$/ /$$__  $$|  $$  /$$/$$__  $$ /$$__  $$
| $$      | $$__  $$| $$__/         | $$__  $$| $$  \ $$ \  $$/$$/ $$$$$$$$| $$  \__/
| $$    $$| $$  | $$| $$            | $$  \ $$| $$  | $$  \  $$$/| $$_____/| $$      
|  $$$$$$/| $$  | $$| $$            | $$  | $$|  $$$$$$/   \  $/ |  $$$$$$$| $$      
 \______/ |__/  |__/|__/            |__/  |__/ \______/     \_/   \_______/|__/      

              version: aztfmod/rover:2007.0108

mode                          : 'rover'
tf_action                     : 'import'
tf_command                    : ''
landingzone                   : '/tf/caf/landingzones/landingzone_caf_foundations'
terraform command output file : '' 
level                         : 'level0'
environment                   : 'sandpit'
tfstate                       : 'landingzone_caf_foundations.tfstate'

Additional context Add any other context about the problem here.

Terraform init return code 0
calling plan and apply
@calling plan
running terraform plan with 
 -TF_VAR_workspace: sandpit
 -state: /home/vscode/.terraform.cache/tfstates/sandpit/landingzone_caf_foundations.tfstate
 -plan:  /home/vscode/.terraform.cache/tfstates/sandpit/landingzone_caf_foundations.tfplan
/tf/caf/landingzones/landingzone_caf_foundations
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.terraform_remote_state.level0_launchpad: Refreshing state...
module.blueprint_foundations_accounting.azurecaf_naming_convention.rg_operations_name: Refreshing state... [id=aCis5FrLOyMcBlla]
module.blueprint_foundations_accounting.module.activity_logs.azurecaf_naming_convention.caf_name_evh: Refreshing state... [id=eRBziXs7vrFvQiaG]
module.blueprint_foundations_accounting.azurecaf_naming_convention.rg_coresec_name: Refreshing state... [id=jbHHD6wOLbfN7ljU]
module.blueprint_foundations_accounting.module.diagnostics_logging.azurecaf_naming_convention.caf_name_evh: Refreshing state... [id=wsGS3NOiVpQuPwlV]
module.blueprint_foundations_accounting.module.log_analytics.azurecaf_naming_convention.caf_name_la: Refreshing state... [id=y0vLmFJd2wFSha3U]
module.blueprint_foundations_accounting.module.diagnostics_logging.azurecaf_naming_convention.caf_name_st: Refreshing state... [id=oHU8tIppqq0RqVrw]
module.blueprint_foundations_accounting.module.activity_logs.azurecaf_naming_convention.caf_name_st: Refreshing state... [id=rBGaorhrTBWD51LD]
module.blueprint_foundations_governance.data.azurerm_client_config.current: Refreshing state...
module.blueprint_foundations_accounting.azurerm_resource_group.rg_coresec: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourceGroups/wkat-rg-hub-core-sec]
module.blueprint_foundations_security.module.security_center.azurerm_security_center_contact.contact[0]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Security/securityContacts/default1]
module.blueprint_foundations_security.module.security_center.azurerm_security_center_subscription_pricing.sc[0]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Security/pricings/default]
module.blueprint_foundations_accounting.data.azurerm_client_config.current: Refreshing state...
module.blueprint_foundations_security.data.azurerm_client_config.current: Refreshing state...
module.blueprint_foundations_accounting.azurerm_resource_group.rg_operations: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourceGroups/wkat-rg-hub-operations]
module.blueprint_foundations_governance.data.azurerm_subscription.current: Refreshing state...
module.blueprint_foundations_governance.module.management_groups.data.azurerm_client_config.current: Refreshing state...
module.blueprint_foundations_accounting.module.activity_logs.data.azurerm_subscription.current: Refreshing state...
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_workspace.log_analytics: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/microsoft.operationalinsights/workspaces/wkat-la-caflalogs]
module.blueprint_foundations_accounting.module.diagnostics_logging.azurerm_storage_account.log: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourceGroups/wkat-rg-hub-operations/providers/Microsoft.Storage/storageAccounts/wkatstdiaglogs]
module.blueprint_foundations_accounting.module.activity_logs.azurerm_storage_account.log: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourceGroups/wkat-rg-hub-core-sec/providers/Microsoft.Storage/storageAccounts/wkatstactlogs]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.pol_managed_disks_assignment[0]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Authorization/policyAssignments/vm_no_managed_disks]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["ContainerInsights"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/ContainerInsights(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["AgentHealthAssessment"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/AgentHealthAssessment(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["ADReplication"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/ADReplication(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["ADAssessment"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/ADAssessment(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["KeyVaultAnalytics"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/KeyVaultAnalytics(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["DnsAnalytics"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/DnsAnalytics(wkat-la-caflalogs)]
module.blueprint_foundations_accounting.module.log_analytics.azurerm_log_analytics_solution.la_solution["NetworkMonitoring"]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/NetworkMonitoring(wkat-la-caflalogs)]
module.blueprint_foundations_security.module.sentinel.azurerm_log_analytics_solution.sentinel[0]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/Microsoft.OperationsManagement/solutions/SecurityInsights(wkat-la-caflalogs)]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.vm_auto_monitor[0]: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Authorization/policyAssignments/vm_auto_monitor]
module.blueprint_foundations_accounting.module.activity_logs.azurerm_monitor_diagnostic_setting.audit: Refreshing state... [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257|actlogs]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0] will be created
  + resource "azurerm_policy_assignment" "res_location" {
      + description          = "Policy Assignment with Terraform"
      + display_name         = "TF Restrict Deployment of Azure Resources in specific location"
      + enforcement_mode     = true
      + id                   = (known after apply)
      + name                 = "res_location"
      + parameters           = jsonencode(
            {
              + listOfAllowedLocations = {
                  + value = [
                      + "canadacentral",
                      + "canadaeast",
                    ]
                }
            }
        )
      + policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/e56962a6-4747-49cd-b67b-bf8b01975c4c"
      + scope                = "/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257"

      + identity {
          + principal_id = (known after apply)
          + tenant_id    = (known after apply)
          + type         = (known after apply)
        }
    }

  # module.blueprint_foundations_security.module.security_center.azurerm_security_center_workspace.sc[0] will be created
  + resource "azurerm_security_center_workspace" "sc" {
      + id           = (known after apply)
      + scope        = "/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257"
      + workspace_id = "/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/resourcegroups/wkat-rg-hub-operations/providers/microsoft.operationalinsights/workspaces/wkat-la-caflalogs"
    }

Plan: 2 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: /home/vscode/.terraform.cache/tfstates/sandpit/landingzone_caf_foundations.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "/home/vscode/.terraform.cache/tfstates/sandpit/landingzone_caf_foundations.tfplan"

Terraform plan return code: 0
@calling apply
running terraform apply
Acquiring state lock. This may take a few moments...
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Creating...
module.blueprint_foundations_security.module.security_center.azurerm_security_center_workspace.sc[0]: Creating...
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [10s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [20s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [30s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [40s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [50s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [1m0s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [1m10s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [1m20s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Still creating... [1m30s elapsed]
module.blueprint_foundations_governance.module.builtin_policies.azurerm_policy_assignment.res_location[0]: Creation complete after 1m31s [id=/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Authorization/policyAssignments/res_location]
Terraform apply return code: 0
Terraform returned errors:

Error: A resource with the ID "/subscriptions/xxxxxxxxx-7c3c-446d-8015-9ae244c26257/providers/Microsoft.Security/workspaceSettings/default" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_security_center_workspace" for more information.

  on /home/vscode/.terraform.cache/modules/blueprint_foundations_security.security_center/terraform-azurerm-caf-security-center-1.0/module.tf line 15, in resource "azurerm_security_center_workspace" "sc":
  15: resource "azurerm_security_center_workspace" "sc" {

LaurentLesle commented 4 years ago

Import existing resources are not covered yet. That may explain the issue you are reporting. I would re-qualify that as an improvement to add in the rover

patpicos commented 4 years ago

Import existing resources are not covered yet. That may explain the issue you are reporting. I would re-qualify that as an improvement to add in the rover

Exposing the state commands would also be useful to determine the list of resources in the state....to generate the proper path for importing a resource

bernardmaltais commented 4 years ago

It would be nice if rover actually exposed all terraform commands. I would certainly like being able to import and export statefile for backup/restore purpose using rover as I used to be able to with the terraform cli. This has been an issue in many cases where I wanted to backup statefiles before upgrading from one release of launchpad to the next.

bernardmaltais commented 4 years ago

We ran in many occasions where having access to the taint command would have saved us a full destroy and apply when something goes wrong on Azure. It often happen that some DSC deployment on VM fail and using rover it is impossible to issue a terraform taint and then re-apply to rebuild the one failed resource out of one hundred.

bmaltais commented 4 years ago

@patpicos I ran in a strange feature of rover today. This might or might not help you. It did help me with my taint issue for resources. If you run the rover command like what you usually type, like:

rover -lz /tf/caf/landingzones/someblueprint -a apply

then go in the folder containing the code for the landingzone and run:

terraform state list

you will get the values from the statefile cached for that LZ. Interesting enough, if you actually do a taint on one of the resources it will actually write this up to the Azure Storage... This must be possible due to some local caching in the devcontainer that properly orient terraform to the backend storage. Unexpected but useful. I was able to taint my resource that way and on subsequent rover apply it got re-created as I needed.

For example:

[vscode@78c5751e69cd code]$ terraform taint time_offset.tomorrow
Acquiring state lock. This may take a few moments...
Resource instance time_offset.tomorrow has been marked as tainted.
Releasing state lock. This may take a few moments...

See how it is acquiring a lock on the statefile in the cloud? I then confirmed it actually wrote the taint command to the correct Azure state file by looking into the statefile in the storage account for the taint and fount it there.

This appear to be thanks to the envvar: TF_DATA_DIR

So technically I could change the TF_DATA_DIR to some different folder for each landing zone before executing the rover command to keep an active local cache per landingzone rather than a shared one and keep a local cached statefile for each.

bernardmaltais commented 4 years ago

I have actually implemented the local cache feature using a custom script that set the TF_DATA_DIR at runtime. The nice side effect is that now I can go into any LZ and easily issue commands like terraform state pull > backupstate.file.

I have actually implemented an automatic backup of the remote state file to the LZ cache before doing an apply. I have lost my statefile many times when using rover and running into timeouts, lost connectivity, etc... and not having a backup of the statefile has been a big issue. This is no more with this.

Here is the short bash code to do this:

# Taking backup of statefile before applying if cache already exist

if [[ ${command} == "apply" ]]; then
  if [[ -d ${TF_DATA_DIR} ]]; then
    date=`date +%Y%m%d%H%M%S`
    current=${PWD}
    cd code
    echo "Taking backup of state file"
    terraform state pull > ${TF_DATA_DIR}/terraform.state.${date}
    cd ${current}
  else
    echo "cache does not yet exist, can't take backup."
  fi
fi

patpicos commented 4 years ago

Might be wise to update the launchpad to enable the blob versioning (in preview)

bmaltais commented 4 years ago

@patpicos Do you have a link for that preview? I found this: https://medium.com/@ripon.banik/terrform-state-and-versioning-in-azure-72cb92aa4f19

but I think you are refering to something else perhaps?

EDIT:

Found it: https://docs.microsoft.com/en-us/azure/storage/blobs/versioning-overview?tabs=powershell

bernardmaltais commented 4 years ago

I tried using the version feature of the storage account and it does indeed track statefile changes... but man it is chatty. This is the result of simply running an apply on a deployed plan:

It literally created 4 interim versions. It is nice but I am worried we will drown in versions given how terraform/rover appear to touch the statefile.

patpicos commented 4 years ago

Yikes. Storage is cheap so I would not be super worried. It would be nice if a lifecycle policy could be used to maintain X versions of a blob

gawainXX commented 2 years ago

I'm running into this problem using terraform via azure CLI, is there a way to resume a deployment when the console booted you due to inactivity?

Azure / caf-terraform-landingzones

[bug] How to manage state after a timed out deployment #60