ansible-collections / azure

Development area for Azure Collections
https://galaxy.ansible.com/azure/azcollection
GNU General Public License v3.0
244 stars 324 forks source link

Key Vault lookup plugin fails occasionally when resolving secrets #1043

Open flootsk opened 1 year ago

flootsk commented 1 year ago
SUMMARY

Azure Key Vault lookup plugin fails occasionally when evaluating inventory variables containing secret lookups

ISSUE TYPE
COMPONENT NAME

azure_keyvault_secret

ANSIBLE VERSION
ansible [core 2.12.10]
COLLECTION VERSION
azure.azcollection 1.14.0
CONFIGURATION

empty

OS / ENVIRONMENT

Ansible manager host:

In all cases, the VM or ACI has a network interface plugged to a VNET. All these resources are hosted in Azure.

STEPS TO REPRODUCE

As the issue occurs occasionally, the focus of the sample playbook is to generate many secret lookups in order to have the "chance" to trigger it intentionally.

Given the following inventory:

key_vault:
  endpoint: "https://somekeyvault.vault.azure.net"
  subscription_id: "{{ lookup('env', 'ARM_SUBSCRIPTION_ID') }}"
  client_id: "{{ lookup('env', 'ARM_CLIENT_ID') }}"
  secret: "{{ lookup('env', 'AZURE_SECRET') }}"
  tenant_id: "{{ lookup('env', 'AZURE_TENANT') }}"

akv_secret_value: >-
  {{ lookup(
    'azure.azcollection.azure_keyvault_secret',
    'somesecret',
    vault_url=key_vault.endpoint,
    client_id=key_vault.client_id,
    secret=key_vault.secret,
    tenant_id=key_vault.tenant_id)
  }}

Here is a sample playbook that performs the same secret lookup many times:

- hosts: all
  gather_facts: false
  tasks:
    - name: Test
      ansible.builtin.debug:
        msg: "{{ akv_secret_value }}"
      delegate_to: localhost
      with_sequence: start=0 count=25
EXPECTED RESULTS

All secret lookups work and retrieve the expected Azure Key Vault secret value.

ACTUAL RESULTS

From time to time, i.e. about 2 or 3 times a week on a playbook that is run once everyday, one of the following error messages happens:

fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'azure.keyvault.models.key_vault_error.KeyVaultErrorException'>, original message: (Unauthorized) AKV10022: Invalid audience. Expected https://vault.azure.net, found: https://management.core.windows.net/.. (Unauthorized) AKV10022: Invalid audience. Expected https://vault.azure.net, found: https://management.core.windows.net/."}

or

fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'requests.exceptions.ConnectionError'>, original message: ('Connection aborted.', TimeoutError(110, 'Connection timed out')). ('Connection aborted.', TimeoutError(110, 'Connection timed out'))"}

or

fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'requests.exceptions.ReadTimeout'>, original message: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Read timed out. (read timeout=None). HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Read timed out. (read timeout=None)"}

Please note that an Azure Support Ticket was open prior to create this GitHub issue, but Microsoft do not see any service disruption from Azure Key Vault side at the time of the issues. They do not see any limit reached in the Azure Key Vault query count. So this looks like a client side issue to them.

Would hardcoded URLs in the azure/plugins/lookup/azure_keyvault_secret.py be faulty?

Fred-sun commented 4 months ago

@flootsk What do you mean by hardcoding only urls? I have repeatedly simulated in the local area, but I have not encountered the problem you submitted. Can you explain it in detail? Thank you!