Azure / sap-automation

This is the repository supporting the SAP deployment automation framework on Azure
MIT License
122 stars 142 forks source link

[BUG] - Workload Zone Deployment fails at initialization stage using shell scripts #498

Closed jckeme-rs closed 7 months ago

jckeme-rs commented 11 months ago

Describe the bug During the workload zone deployment using shell scripts, terraform provider initialization fails with the following set of errors:

╷
│ Error: parsing "": cannot parse an empty string
│
│   with data.azurerm_key_vault_secret.subscription_id,
│   on imports.tf line 24, in data "azurerm_key_vault_secret" "subscription_id":
│   24:   key_vault_id = local.spn_key_vault_arm_id
│
╵
╷
│ Error: parsing "": cannot parse an empty string
│
│   with data.azurerm_key_vault_secret.client_id[0],
│   on imports.tf line 30, in data "azurerm_key_vault_secret" "client_id":
│   30:   key_vault_id = local.spn_key_vault_arm_id
│
╵
╷
│ Error: parsing "": cannot parse an empty string
│
│   with data.azurerm_key_vault_secret.client_secret[0],
│   on imports.tf line 36, in data "azurerm_key_vault_secret" "client_secret":
│   36:   key_vault_id = local.spn_key_vault_arm_id
│
╵
╷
│ Error: parsing "": cannot parse an empty string
│
│   with data.azurerm_key_vault_secret.tenant_id[0],
│   on imports.tf line 42, in data "azurerm_key_vault_secret" "tenant_id":
│   42:   key_vault_id = local.spn_key_vault_arm_id
│
╵
╷
│ Error: Invalid index
│
│   on variables_local.tf line 45, in locals:
│   45:     client_id       = var.use_spn ? data.azurerm_key_vault_secret.cp_client_id[0].value : null,
│     ├────────────────
│     │ data.azurerm_key_vault_secret.cp_client_id is empty tuple
│
│ The given key does not identify an element in this collection value: the collection has no elements.
╵
╷
│ Error: Invalid index
│
│   on variables_local.tf line 46, in locals:
│   46:     client_secret   = var.use_spn ? data.azurerm_key_vault_secret.cp_client_secret[0].value : null,
│     ├────────────────
│     │ data.azurerm_key_vault_secret.cp_client_secret is empty tuple
│
│ The given key does not identify an element in this collection value: the collection has no elements.
╵
╷
│ Error: Invalid index
│
│   on variables_local.tf line 47, in locals:
│   47:     tenant_id       = var.use_spn ? data.azurerm_key_vault_secret.cp_tenant_id[0].value : null
│     ├────────────────
│     │ data.azurerm_key_vault_secret.cp_tenant_id is empty tuple
│
│ The given key does not identify an element in this collection value: the collection has no elements.

To reproduce Steps to reproduce the behavior: 1 - Bootstrap Control Plane Using Shell Scripts without private endpoints yet 2 - SSH into Deployer VM, copy naming convention module, and deployer and library tfvars file into workspace/deployer directory onto deployer VM 3 - Modify deployer and library tfvars to enable private endpoints (leave public endpoints enabled due to the way terraform incorrectly handles dependencies where it attempts to disable access via public endpoints first while interacting with tfstate via the public endpoints - this should be another reported bug). 4 - After private endpoints are created, and control plane is secure, copy workloadzone tfvars file into deployer VM. 5 - Run the following:

$SAP_AUTOMATION_REPO_PATH/deploy/scripts/install_workloadzone.sh --parameterfile "${parameterFile}" --state_subscription <state-subscription-uuid> --storageaccountname <tfstate-storage-name> --deployer_environment MGMT --spn_id "${ARM_CLIENT_ID}" --spn_secret "${ARM_CLIENT_SECRET}" --tenant_id "${ARM_TENANT_ID}" --deployer_tfstate_key tfstate

Here is the set of variables which were specified in the workloadzone tfvars file (most of which are defaults from the sap-samples repository:

environment = "NONPROD"
location = "eastus2"
network_logical_name = "SAP01"
network_address_space = "<redacted>/23"
use_private_endpoint = true
use_service_endpoint = true
peer_with_control_plane_vnet = true
enable_firewall_for_keyvaults_and_storage = true
public_network_access_enabled = true
place_delete_lock_on_resources = false
admin_subnet_address_prefix = "<redacted>/26"
db_subnet_address_prefix = "<redacted>/26"
app_subnet_address_prefix = "<redacted>/26"
web_subnet_address_prefix = "<redacted>/26"
management_dns_resourcegroup_name = "<redacted>"
management_dns_subscription_id = "<redacted>"
use_custom_dns_a_registration = false
enable_purge_control_for_keyvaults = false
automation_username = "<redacted>"
install_volume_size = 1024
transport_volume_size = 128
storage_account_replication_type = "LRS"
NFS_provider = "AFS"
utility_vm_count = 0
utility_vm_useDHCP = true

Expected behavior A successful deployment of the workloadzone resources

hdamecharla commented 11 months ago

@jckeme-rs Thank you for bringing the bug to our attention.

I am assuming you are following the Tutorial: SAP Deployment Automation Framework. From the command line shared, there are a couple of parameters that seem to be missing. Specifically, the deployer keyvault name and the deployer_tfstate_key should be the complete name of the blob within the tfstate storage account, as opposed to just the container name.

Please let us know if this helps resolve your issues.

jckeme-rs commented 11 months ago

Hi @hdamecharla Thank you for the feedback. Adding these parameters still yielded the same result:

$SAP_AUTOMATION_REPO_PATH/deploy/scripts/install_workloadzone.sh \
--parameterfile "${parameterFile}" \
--state_subscription <state-subscription-uuid> \
--storageaccountname <tfstate-storage-name> \
--deployer_environment MGMT \
--spn_id "${ARM_CLIENT_ID}" \
--spn_secret "${ARM_CLIENT_SECRET}" \
--tenant_id "${ARM_TENANT_ID}" \
--deployer_tfstate_key tfstate \
--keyvault <deployer-kv-name>

I'm certain also that the terraform initialization does begin correctly as I can see a new state blob created for this workload zone. Notice how this state blob has been "Leased".

image

Did you take a look at the errors to track down a probable cause?

KimForss commented 8 months ago

Hi @jckeme-rs ,

In the control plane deployment, do you define an existing Private DNS zone or are you letting the control plane define it?

Defining just the dns_label variable in the SAP Library tfvars will create a local private DNS zone whereas defining management_dns_subscription_id = <> management_dns_resourcegroup_name = <> use_custom_dns_a_registration = false points you to an existing one.

When using private endpoints Terraform will need to be able to resolve the DNS name vaultname..privatelink.vaultcore.azure.net to get the IP of the key vault.

jckeme-rs commented 8 months ago

Hi @KimForss ,

Yes we do specify the management_dns_resourcegroup_name, management_dns_subscription_id.

use_custom_dns_a_registration = true, which ensures that DNS records are created for the private endpoint IPs within the correct private dns zones that exist in the specified resource group.

From within the deployer VMs, DNS resolves correctly to the private IPs assigned to the keyvaults and storage account (excluding the boot diagnostics storage account which I believe is intended).

KimForss commented 8 months ago

Hi, can you try with use_custom_dns_a_registration = false

jckeme-rs commented 8 months ago

Setting use_custom_dns_a_registration = false at the workload zone deployment stage still led to the same errors reported by terraform.

Re-Deployment was initiated as follows:

$SAP_AUTOMATION_REPO_PATH/deploy/scripts/install_workloadzone.sh \
--parameterfile "${parameterFile}" \
--deployer_environment "${deployer_environment}" \
--subscription "${ARM_SUBSCRIPTION_ID}" \
--spn_id "${ARM_CLIENT_ID}" \
--spn_secret "${ARM_CLIENT_SECRET}" \
--tenant_id "${ARM_TENANT_ID}" \
--deployer_tfstate_key tfstate \
--keyvault <redacted> \
--state_subscription <redacted> \
--storageaccountname <redacted> \
--force
jckeme-rs commented 8 months ago

@KimForss I've just looked through some of your other documentation in detail and noticed that the "deployer_tfstate_key" does need to reference the deployer state file as it is in the keyvault, same as @hdamecharla outlined in his comment. We had missed this parameter while referencing the "Deploy the workload zone" guide: https://learn.microsoft.com/en-us/azure/sap/automation/deploy-workload-zone?tabs=linux

This did get the deployment to go past the previous error.

Alongside, setting use_custom_dns_a_registration = true still returns one error.

│ Error: Invalid index
│
│   on ../../terraform-units/modules/sap_landscape/key_vault_sap_landscape.tf line 423, in resource "azurerm_private_endpoint" "kv_user":
│  423:                                         private_dns_zone_ids = [data.azurerm_private_dns_zone.keyvault[0].id]
│     ├────────────────
│     │ data.azurerm_private_dns_zone.keyvault is empty tuple
│
│ The given key does not identify an element in this collection value: the collection has no elements.

Is this a known issue or perhaps some additional configuration parameter which might need to be specified?

jckeme-rs commented 7 months ago

Closing this issue as the initial error which was reported was due to incorrect parameters being passed to the deployment scripts.

The error related to setting use_custom_dns_a_registration = true still persists. We will report this via a different issue.