Open harneetvirk opened 2 years ago
@jiasli for awareness
The log seems to be trimmed. Could you share the full log please?
How long does it usually take to call
az account get-access-token --output json --resource https://management.azure.com
?
@jiasli : I have shared the link of the logs to you on Teams.
We are running into this everyday since we've upgraded to 2.30.0. Like the submitter pointed out, this is wrecking our CI/CD pipeline jobs.
Environment : RHEL 7/8.
$ time az account get-access-token --output json --resource https://management.azure.com/
{
...
}
real 0m13.078s
For temporary relief, is there a way to increase the timeout for getting the token?
@yatamlite I got around this by adding an override to the installed package code on the Docker image I use it in.
First confirmed that the file only has 1 instance of the timeout value in the code (annoying its not set as a variable in v2.29.2)
cat /usr/local/lib/python3.7/dist-packages/azure/identity/_credentials/azure_cli.py | grep -c 10
Used sed
to replace it with my desired timeout of 30secs
RUN sed -i 's/10/30' /usr/local/lib/python3.7/dist-packages/azure/identity/_credentials/azure_cli.py
@jiasli We're running into the issue in our runner and file ticket. Is there ETA to fix this bug? {'azure-ai-ml': '1.4.0', 'mldesigner': '0.1.0b11', 'az': '{\n \"azure-cli\": \"2.45.0\",\n \"azure-cli-core\": \"2.45.0\",\n \"azure-cli-telemetry\": \"1.0.8\",\n \"extensions\": {\n \"databricks\": \"0.9.0\",\n \"ml\": \"2.14.0\"\n }\n}\n', 'python': '3.8.13'}"
subprocess.TimeoutExpired: Command '['/bin/sh', '-c', 'az account get-access-token --output json --resource https://management.azure.com']' timed out after 10 seconds
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tests/scenarios/sdk-cli-v2/runners/sdk/utils.py", line 45, in log_wrapper
res = fun(*args, kwargs)
File "/tests/scenarios/sdk-cli-v2/runners/sdk/test_sdk_runner.py", line 157, in test_command_component
self.pipeline_submit(pipeline)
File "/tests/scenarios/sdk-cli-v2/runners/sdk/test_sdk_runner.py", line 84, in pipeline_submit
pipeline_job = self.client.jobs.create_or_update(
File "/opt/conda/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_job_operations.py", line 562, in create_or_update
raise ex
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_job_operations.py", line 498, in create_or_update
self._resolve_arm_id_or_upload_dependencies(job)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_job_operations.py", line 842, in _resolve_arm_id_or_upload_dependencies
self._resolve_arm_id_or_azureml_id(job, self._orchestrators.get_asset_arm_id)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_job_operations.py", line 1064, in _resolve_arm_id_or_azureml_id
job = self._resolve_arm_id_for_pipeline_job(job, resolver)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_job_operations.py", line 1179, in _resolve_arm_id_for_pipeline_job
self._component_operations._resolve_dependencies_for_pipeline_component_jobs(
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_component_operations.py", line 719, in _resolve_dependencies_for_pipeline_component_jobs
layers = self._divide_nodes_to_resolve_into_layers(
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_component_operations.py", line 674, in _divide_nodes_to_resolve_into_layers
extra_operation(job_instance, key)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_component_operations.py", line 480, in _try_resolve_environment_for_component
component.environment = resolver(
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 218, in get_asset_arm_id
result = self._get_environment_arm_id(asset, register_asset=register_asset)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_operation_orchestrator.py", line 287, in _get_environment_arm_id
env_response = self._environments.create_or_update(environment)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_environment_operations.py", line 144, in create_or_update
raise ex
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/operations/_environment_operations.py", line 128, in create_or_update
else self._version_operations.create_or_update(
File "/opt/conda/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/azure/ai/ml/_restclient/v2022_05_01/operations/_environment_versions_operations.py", line 510, in create_or_update
pipeline_response = self._client._pipeline.run(request, stream=False, kwargs)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/_base.py", line 205, in run
return first_node.send(pipeline_request) # type: ignore
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/_base.py", line 69, in send
response = self.next.send(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/_base.py", line 69, in send
response = self.next.send(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/_base.py", line 69, in send
response = self.next.send(request)
[Previous line repeated 2 more times]
File "/opt/conda/lib/python3.8/site-packages/azure/mgmt/core/policies/_base.py", line 47, in send
response = self.next.send(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/policies/_redirect.py", line 160, in send
response = self.next.send(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/policies/_retry.py", line 474, in send
response = self.next.send(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication.py", line 115, in send
self.on_request(request)
File "/opt/conda/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication.py", line 92, in on_request
self._token = self._credential.get_token(self._scopes)
File "/opt/conda/lib/python3.8/site-packages/azure/identity/_internal/decorators.py", line 32, in wrapper
token = fn(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/azure/identity/_credentials/azure_cli.py", line 80, in get_token
output = _run_command(command)
File "/opt/conda/lib/python3.8/site-packages/azure/identity/_credentials/azure_cli.py", line 169, in _run_command
six.raise_from(error, ex)
File "
I am also affected by this. Excuse the Python-specific stuff, I realise we're in the CLI repo!
Consider this example snippet:
import os
from azure.identity.aio import DefaultAzureCredential
from azure.storage.blob.aio import BlobServiceClient
class AzureBlobStorage:
def __init__(
self,
credential: DefaultAzureCredential | None = None,
) -> None:
if credential is None:
credential = DefaultAzureCredential()
self.credential = credential
async def do_things(self) -> None:
async with self.credential as credential:
async with BlobServiceClient(
account_url=os.environ["AZURE_STORAGE_ACCOUNT_URL"],
credential=credential,
) as bsc:
async for container in bsc.list_containers():
print(container)
It's a dummy example for the concept of some class with associated credentials (DefaultAzureCredentials
is instantiated only once!), doing some operation. Doesn't really matter. I followed roughly this guide.
But it turns out, instantiating DefaultAzureCredentials
doesn't really... do anything, if I see right? It mainly sets up the whole credential chain with all required data, but doesn't call any operation just yet.
As soon as BlobServiceClient
requires a token, get_token
is called. The class remembers which credential is/was "successful":
Running this locally, however, the az
CLI is used, as is intended, but getting a token from it seems to involve spawning a whole new Python interpreter per get_token
invocation? (didn't dig into this too much) It absolutely eats away CPU for what is fundamentally a simple operation, you'd think (if anything, I/O-bound, not CPU heavy):
And remembering the credential is helpful, but it doesn't cache the token value. Just the token provider (credential).
I am calling do_things
highly concurrently. Nothing in the docs indicate that's a bad idea, but I am starting to think otherwise... I haven't researched this properly, but I can only suspect some of those Python interpreters are put on a diet by the overburdened CPU, timing them out, leading to the issue people already mentioned in this thread. The eventual stack trace is very similar, anyway, ending in:
| File "/home/alex/project/.venv/lib/python3.11/site-packages/azure/identity/aio/_credentials/azure_cli.py", line 148, in _run_command
| raise error from ex
| azure.identity._exceptions.CredentialUnavailableError: Failed to execute '/bin/sh'
So, failing to spawn a shell process. It's not an OOM issue, as can be seen from the screenshot, there's plenty left.
All this is just conjecture!
EDIT: I would be happy to cache the token value, but the design around get_token
doesn't make that seem like an intended thing for developers to do.
Even I faced similar issue, in long running python script.
error: AzureCliCredential.get_token failed: Failed to invoke the Azure CLI Failed to authenticate, VM test.example.com, error Failed to invoke the Azure CLI
Describe the bug AzureCliCredential fails intermittently on devops pipelines with CredentialUnavailableError because of timed out after 10 seconds, and resulting in the build failures.
'['cmd', '/c', 'az account get-access-token --output json --resource https://management.azure.com']' timed out after 10 seconds
To Reproduce Intermittent CredentialUnavailableError in DevOps pipelines.
Expected behavior Some random Az CLI tasks should not fail with an intermittent CredentialUnavailableError
Environment summary [command]C:\Windows\system32\cmd.exe /D /S /C ""C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin\az.cmd" --version"
azure-cli 2.30.0 Please let us know how we are doing: https://aka.ms/azureclihats
and let us know if you're interested in trying out our newest features: https://aka.ms/CLIUXstudy core 2.30.0 telemetry 1.0.6
Extensions: azure-devops 0.22.0
Additional context Logs from build agent: 2021-11-11T07:56:03.1173758Z 2021-11-11T07:56:03.1174605Z self = <azure.identity._credentials.chained.ChainedTokenCredential object at 0x0000023C52710BA8> 2021-11-11T07:56:03.1175442Z scopes = ('https://management.azure.com/.default',), kwargs = {} 2021-11-11T07:56:03.1176625Z history = [(<azure.identity._credentials.azure_cli.AzureCliCredential object at 0x0000023C5275C550>, 'Failed to invoke the Azure...veBrowserCredential object at 0x0000023C52732828>, 'Timed out after waiting 300 seconds for the user to authenticate')] 2021-11-11T07:56:03.1177864Z credential = <azure.identity._credentials.browser.InteractiveBrowserCredential object at 0x0000023C52732828> 2021-11-11T07:56:03.1179086Z attempts = '\nAttempted credentials:\n\tAzureCliCredential: Failed to invoke the Azure CLI\n\tInteractiveBrowserCredential: Timed out after waiting 300 seconds for the user to authenticate' 2021-11-11T07:56:03.1180830Z message = 'ChainedTokenCredential failed to retrieve a token from the included credentials.\nAttempted credentials:\n\tAzureCliC...er to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.'