Barts-Life-Science / AzureTRE

An accelerator to help organizations build Trusted Research Environments on Azure.
https://microsoft.github.io/AzureTRE
MIT License
0 stars 0 forks source link

Shared service 'firewall' regularly fails to deploy from CI/CD pipeline #28

Open TonyWildish-BH opened 7 months ago

TonyWildish-BH commented 7 months ago

Describe the bug

I'm deploying the TRE through the CI/CD Deploy Azure TRE action, triggered manually. I've created a new tre_id, and this is the first time I deploy it. The process fails on deploying the shared firewall, the error message is:

Operation state: deploying (action=install) - refreshing... Operation state: deploying (action=install) - refreshing... Operation state: deploying (action=install) - refreshing... Failed to deploy shared service: id status action resourcePath message


ac4f7428-b5ca-4a8b-9ec1-49cbc2ca5af7 deployment_failed install /shared-services/9744228a-6f33-4d35-8d0f-879ec8d968da 9744228a-6f33-4d35-8d0f-879ec8d968da: Error message: Unable to find image '.azurecr.io/tre-shared-service-firewall@sha256:68246af02e0059431aaa429ff316fddd48598872152298c2c14dd84a03168cbc' locally ╷ │ Error: waiting Firewall Policy Rule Collection Group "rcg-core" (Resource Group "rg-" / Policy: "fw-policy-"): Code="FirewallPolicyUpdateFailed" Message="Put on Firewall Policy fw-policy- Failed with 1 faulted referenced firewalls" │ │ with azurerm_firewall_policy_rule_collection_group.core, │ on rules.tf line 1, in resource "azurerm_firewall_policy_rule_collection_group" "core": │ 1: resource "azurerm_firewall_policy_rule_collection_group" "core" │ ╵ error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 Error: error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 1 error occurred: mixin execution failed: package command failed /cnab/app/cnab/app/mixins/terraform/runtimes/terraform-runtime install ╷ │ Error: waiting Firewall Policy Rule Collection Group "rcg-core" (Resource Group "rg-**" / Policy: "fw-policy-"): Code="FirewallPolicyUpdateFailed" Message="Put on Firewall Policy fw-policy- Failed with 1 faulted referenced firewalls" │ │ with azurerm_firewall_policy_rule_collection_group.core, │ on rules.tf line 1, in resource "azurerm_firewall_policy_rule_collection_group" "core": │ 1: resource "azurerm_firewall_policy_rule_collection_group" "core" │ ╵ error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 Error: error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 1 error occurred: * mixin execution failed: package command failed /cnab/app/cnab/app/mixins/terraform/runtimes/terraform-runtime install ╷ │ Error: waiting Firewall Policy Rule Collection Group "rcg-core" (Resource Group "rg-" / Policy: "fw-policy-"): Code="FirewallPolicyUpdateFailed" Message="Put on Firewall Policy fw-policy- Failed with 1 faulted referenced firewalls" │ │ with azurerm_firewall_policy_rule_collection_group.core, │ on rules.tf line 1, in resource "azurerm_firewall_policy_rule_collection_group" "core": │ 1: resource "azurerm_firewall_policy_rule_collection_group" "core" │ ╵ error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 Error: error running command /cnab/app/terraform /usr/bin/terraform apply -auto-approve -input=false -var api_driven_network_rule_collections_b64=W10= -var api_driven_rule_collections_b64=W10= -var microsoft_graph_fqdn=graph.microsoft.com -var sku_tier=Standard -var tre_id= -var tre_resource_id=9744228a-6f33-4d35-8d0f-879ec8d968da: exit status 1 1 error occurred: * container exit code: 1, message: ; Command executed: porter install "9744228a-6f33-4d35-8d0f-879ec8d968da" --reference .azurecr.io/tre-shared-service-firewall:v1.1.6 --param arm_environment="public" --param arm_use_msi="true" --param id="9744228a-6f33-4d35-8d0f-879ec8d968da" --param microsoft_graph_fqdn="graph.microsoft.com" --param tfstate_container_name="tfstate" --param tfstate_resource_group_name="" --param tfstate_storage_account_name="" --param tre_id="" --force --credential-set arm_auth --credential-set aad_auth make: *** [Makefile:298: deploy-shared-service] Error 1 Error: Process completed with exit code 2.

This has happened twice now, with fresh installs of new TREs. Note that we have had at least one deployment get past this phase, so it's not guaranteed to happen every time.

Note also that the initial error about not being able to find an image is a red herring, the real error is at Error: waiting Firewall Policy Rule Collection Group.

Steps to reproduce

  1. Create a new config.yaml with new tre_id and mgmt resource group, storage account, etc.
  2. Configure the secrets in the CICD environment (use ./gh-secrets.sh from the setup-github-environment branch)
  3. Trigger the Deploy Azure TRE action manually from the github interface.

This might be something to do with permissions for the service principal, but I'm not sure. That does ring a bell though.

Azure TRE release version (e.g. v0.14.0 or main): Head, as of 28/03/24

Deployed Azure TRE components - click the (i) in the UI: n/a

TonyWildish-BH commented 7 months ago

See deployment number 32 for the full story