Azure / azure-dev

A developer CLI that reduces the time it takes for you to get started on Azure. The Azure Developer CLI (azd) provides a set of developer-friendly commands that map to key stages in your workflow - code, build, deploy, monitor, repeat.
https://aka.ms/azd
MIT License
412 stars 201 forks source link

[Issue] Easy to get into a pickle with secretOrRandomPassword when key vault isn't in a good state #1473

Open pamelafox opened 1 year ago

pamelafox commented 1 year ago

Output from azd version azd version 0.5.0-beta.4-pr.2128509 (commit 8b3f34d1a4706a777bbcbe2960de4a75656d94f3)

Output from az version

{ "azure-cli": "2.40.0", "azure-cli-core": "2.40.0", "azure-cli-telemetry": "1.0.8", "extensions": { "containerapp": "0.3.11", "containerapp-compose": "0.2.2" } }

Describe the bug

After an unsuccessful azd up (with a Key Vault related failure), a subsequent azd up results in this error:

ERROR: planning deployment: planning infrastructure provisioning: creating parameters file: substituting command output inside parameter file: reading secret 'postgresPassword' from vault 'flasksurveys-xvmc-vault': getting key vault secret: GET https://flasksurveys-xvmc-vault.vault.azure.net/secrets/postgresPassword/
--------------------------------------------------------------------------------
RESPONSE 403: 403 Forbidden
ERROR CODE: Forbidden
--------------------------------------------------------------------------------
{
  "error": {
    "code": "Forbidden",
    "message": "The user, group or application 'appid=04b07795-8ddb-461a-bbee-02f9e1bf7b46;oid=05c421f6-d446-4f2a-8ee0-1060b47b1543;numgroups=649;iss=https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/' does not have secrets get permission on key vault 'flasksurveys-xvmc-vault;location=eastus'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287",
    "innererror": {
      "code": "AccessDenied"
    }
  }
}
--------------------------------------------------------------------------------

To Reproduce

You need to have a repo which has a key vault that doesn't finish provisioning correctly. I don't have the exact code committed, but for me, that happened when I attempted to create two secrets at once:

module keyVault './core/security/keyvault.bicep' = {
  name: 'keyvault'
  scope: resourceGroup
  params: {
    name: '${take(prefix, 17)}-vault'
    location: location
    tags: tags
    principalId: principalId
  }
}

module keyVaultSecret1 './core/security/keyvault-secret.bicep' = {  name: 'keyvault-secret1'  scope: resourceGroup  params: {    keyVaultName: keyVault.outputs.name    name: 'DBPASS'    secretValue: postgresPassword  }}

module keyVaultSecret2 './core/security/keyvault-secret.bicep' = {  name: 'keyvault-secret2'  scope: resourceGroup  params: {    keyVaultName: keyVault.outputs.name    name: 'FLASK_SECRET'    secretValue: flaskSecret  }  dependsOn: [keyVault, keyVaultSecret1]} 

Both of those were generated with secretOrRandom password:

      "postgresPassword": {
        "value": "$(secretOrRandomPassword ${AZURE_KEY_VAULT_NAME} postgresPassword)"
      },
      "flaskSecret": {
        "value": "$(secretOrRandomPassword ${AZURE_KEY_VAULT_NAME} flaskSecret)"
      }

Expected behavior

Good question! It could output a warning that there was an issue accessing the key vault and re-generate the secrets. Or it could suggest starting over and clearing all the env variables (that's what I do manually right now). Starting over probably safer?

I don't know how often people will run into this, so I'm mostly logging as an FYI and for discoverability.

Environment

Mac M1 Ventura, Terminal app

pamelafox commented 1 year ago

Related pickle: If I delete my resource group and then try again without changing the .env at all, then I get this error about the key vault not existing at all:

ERROR: planning deployment: 
planning infrastructure provisioning: 
creating parameters file: 
substituting command output inside parameter file:
reading secret 'postgresPassword' from vault 'flasksurveys-xvmc-vault':
getting key vault secret: Get "https://flasksurveys-xvmc-vault.vault.azure.net/secrets/postgresPassword/?api-version=7.3":
dial tcp: lookup flasksurveys-xvmc-vault.vault.azure.net: no such host
pamelafox commented 1 year ago

Pickle #3: Key vault exists, but without the correct permissions for my principal ID. (To workaround that, I manually added access controls in the UI, I'm not honestly sure how I managed to get into that particular pickle).

rajeshkamal5050 commented 1 year ago

@jongio can you help take a look?

pamelafox commented 1 year ago

Hm, I'm getting into a pickle even when key vault is in a good state, on subsequent deploys.

Here's the error on azd up:

ERROR: planning deployment: planning infrastructure provisioning: creating parameters file: substituting command output inside parameter file: reading secret 'postgresAdminPassword' from vault 'djangoquizexample-vault': getting key vault secret: GET https://djangoquizexample-vault.vault.azure.net/secrets/postgresAdminPassword/
--------------------------------------------------------------------------------
RESPONSE 403: 403 Forbidden
ERROR CODE: Forbidden
--------------------------------------------------------------------------------
{
  "error": {
    "code": "Forbidden",
    "message": "The user, group or application 'appid=04b07795-8ddb-461a-bbee-02f9e1bf7b46;oid=05c421f6-d446-4f2a-8ee0-1060b47b1543;numgroups=659;iss=https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/' does not have secrets get permission on key vault 'djangoquizexample-vault;location=eastus'. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125287",
    "innererror": {
      "code": "AccessDenied"
    }
  }
}
--------------------------------------------------------------------------------

Here are the access policies shown in the portal:

Screenshot 2023-04-20 at 10 28 40 AM

The relevant Bicep:

module keyVault './core/security/keyvault.bicep' = {
  name: 'keyvault'
  scope: resourceGroup
  params: {
    name: '${take(replace(prefix, '-', ''), 17)}-vault'
    location: location
    tags: tags
    principalId: principalId
  }
}

module webKeyVaultAccess 'core/security/keyvault-access.bicep' = {
  name: 'web-keyvault-access'
  scope: resourceGroup
  params: {
    keyVaultName: keyVault.outputs.name
    principalId: web.outputs.identityPrincipalId
  }
}

So that Bicep should give access to the principal ID that's stored in the azd environment, as well as to the app. The latter is working, as the app works, but I wonder if I'm supposed to add a third principal ID to get secretOrRandomPassword to work? @jongio Can you advise if something looks off here?

pamelafox commented 1 year ago

Workaround for above is to manually give my personal account permission on the key vault:

Screenshot 2023-04-20 at 10 34 14 AM

vhvb1989 commented 1 year ago

image

@pamelafox , my theory is that a CI build is re-setting the access policies to be equal to: the app + the service principal used for the pipeline. I need to double check this, but I used to get this behavior by running

azd init -t someTemplate azd pipeline config

Since I was not running azd up locally, but only from the CI build, my user was never added to the access policy list for keyvault, so I wasn't able to manually run the apps locally, as my user was failing to access the KeyVault secrets.

From azd version 0.8.0 ahead, we can't no longer run azd pipeline config before running azd provision, So the test I need to run now is

azd init -t someTemplate azd provision // This should add my user (or the user logged in to azd) to the access policy azd pipeline config // After the CI build runs, the expectation should be that the service-principal is added/Appended to the list of access policies. The existing list should not be replaced.

pamelafox commented 1 year ago

Oh, thank you, that is super interesting. I will test out the workflow from scratch and report back.

pamelafox commented 1 year ago

Alas, I'm seeing the same issue with your suggested workflow!

I did:

azd init -t https://github.com/pamelafox/django-quiz-app.git 
azd provision
azd pipeline config

All good at this point, pipeline worked.

Then, I ran azd deploy, and that also worked.

However, when I tried to run azd up, I got the key vault access error.

When I look at key vault, really freaky things happen. For a few seconds, I see a different set of principal IDs. Maybe it's just that the IDs get turned into more human-friendly names? Screenshots:

First few seconds:

Screenshot 2023-04-20 at 12 57 38 PM

Then it's replaced with:

Screenshot 2023-04-20 at 12 58 18 PM

Is it possible that pipeline config is kicking out my personal principal ID? I should have checked Key Vault before running pipeline config. Let me try the experiment again...

pamelafox commented 1 year ago

Experiment #2:

azd init -t https://github.com/pamelafox/django-quiz-app.git 
azd provision

Current status of Key Vault - looks good!

Screenshot 2023-04-20 at 1 21 48 PM

After azd pipeline config, Key Vault looks the same.

I then waited for the azd pipeline to complete. Unfortunately, both pipelines failed (both, due to the 2x issue), seemingly due to colliding with each other.

Key Vault is now in the state where it only has a GitHub CI principal:

Screenshot 2023-04-20 at 1 27 53 PM

So, the GitHub CI does seem to be messing with the Key Vault access policies, from what I can tell.

vhvb1989 commented 1 year ago

After azd pipeline config, Key Vault looks the same.

Yes, azd pipeline config does not change anything. It is the call to azd provision from the CI pipeline the one that is changing the access policies...

That is because azd pipeline config creates a service principal and set it as a secret on the CI build as the PRINCIPAL_ID. When azd provision runs on CI, it is using this new PRINCIPAL_ID and replacing the existing.

We need to separate the principal_id that is used in CI build than what its used locally. Right now, our pipelines are breaking the ability to run applications locally after the first run (when there is KeyVault involved) :(

kjaymiller commented 1 year ago

I'm running into a similar issue where after I use azd down --purge I'm unable to reference the parameter in main.parameters.json. Here is the following error:

$: azure_flask_cosmos_mongodb_aca azd up       

Packaging services (azd package)

  |       |        Packaging service web (Tagging Docker i
  (✓) Done: Packaging service web
  - Image Hash: sha256:df5ac7ba31d71357de8b8c976cac46574cb9639e70ab45fe18d889745c3adb01
  - Image Tag: azure-flask-cosmos-mongodb-aca/web-cc-flask-cosmos-mongo:azd-deploy-1693501560

Provisioning Azure resources (azd provision)
Provisioning Azure resources can take some time

ERROR: deployment failed: failing invoking action 'provision', error deploying infrastructure: creating parameters file: substituting command output inside parameter file: reading secret 'SECRETKEY' from vault 'ccflaskcosmosmong-vault': getting key vault secret: Get "https://ccflaskcosmosmong-vault.vault.azure.net/secrets/SECRETKEY/?api-version=7.4": dial tcp: lookup ccflaskcosmosmong-vault.vault.azure.net: no such host