Rocky Linux 8-based Azure DevOps agent in the form of a Microsoft Azure Container Instance
RHEL 8 VM
Rocky Linux 8 VM
Ubuntu 22 VM
In all cases, the VM or ACI has a network interface plugged to a VNET. All these resources are hosted in Azure.
STEPS TO REPRODUCE
As the issue occurs occasionally, the focus of the sample playbook is to generate many secret lookups in order to have the "chance" to trigger it intentionally.
Here is a sample playbook that performs the same secret lookup many times:
- hosts: all
gather_facts: false
tasks:
- name: Test
ansible.builtin.debug:
msg: "{{ akv_secret_value }}"
delegate_to: localhost
with_sequence: start=0 count=25
EXPECTED RESULTS
All secret lookups work and retrieve the expected Azure Key Vault secret value.
ACTUAL RESULTS
From time to time, i.e. about 2 or 3 times a week on a playbook that is run once everyday, one of the following error messages happens:
fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'azure.keyvault.models.key_vault_error.KeyVaultErrorException'>, original message: (Unauthorized) AKV10022: Invalid audience. Expected https://vault.azure.net, found: https://management.core.windows.net/.. (Unauthorized) AKV10022: Invalid audience. Expected https://vault.azure.net, found: https://management.core.windows.net/."}
or
fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'requests.exceptions.ConnectionError'>, original message: ('Connection aborted.', TimeoutError(110, 'Connection timed out')). ('Connection aborted.', TimeoutError(110, 'Connection timed out'))"}
or
fatal: [somehost]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'azure.azcollection.azure_keyvault_secret'. Error was a <class 'requests.exceptions.ReadTimeout'>, original message: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Read timed out. (read timeout=None). HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Read timed out. (read timeout=None)"}
Please note that an Azure Support Ticket was open prior to create this GitHub issue, but Microsoft do not see any service disruption from Azure Key Vault side at the time of the issues. They do not see any limit reached in the Azure Key Vault query count.
So this looks like a client side issue to them.
@flootsk What do you mean by hardcoding only urls? I have repeatedly simulated in the local area, but I have not encountered the problem you submitted. Can you explain it in detail? Thank you!
SUMMARY
Azure Key Vault lookup plugin fails occasionally when evaluating inventory variables containing secret lookups
ISSUE TYPE
COMPONENT NAME
azure_keyvault_secret
ANSIBLE VERSION
COLLECTION VERSION
CONFIGURATION
empty
OS / ENVIRONMENT
Ansible manager host:
In all cases, the VM or ACI has a network interface plugged to a VNET. All these resources are hosted in Azure.
STEPS TO REPRODUCE
As the issue occurs occasionally, the focus of the sample playbook is to generate many secret lookups in order to have the "chance" to trigger it intentionally.
Given the following inventory:
Here is a sample playbook that performs the same secret lookup many times:
EXPECTED RESULTS
All secret lookups work and retrieve the expected Azure Key Vault secret value.
ACTUAL RESULTS
From time to time, i.e. about 2 or 3 times a week on a playbook that is run once everyday, one of the following error messages happens:
or
or
Please note that an Azure Support Ticket was open prior to create this GitHub issue, but Microsoft do not see any service disruption from Azure Key Vault side at the time of the issues. They do not see any limit reached in the Azure Key Vault query count. So this looks like a client side issue to them.
Would hardcoded URLs in the azure/plugins/lookup/azure_keyvault_secret.py be faulty?