Azure / azure-cli

Azure Command-Line Interface
MIT License
4.03k stars 3k forks source link

CLI crashes when using terraform azure provider #12947

Open Bowbaq opened 4 years ago

Bowbaq commented 4 years ago

Describe the bug

Command Name terraform apply, terraform plan or terraform refresh using AZ CLI credential provider

Errors:

Error: Error building account: Error getting authenticated object ID: Error parsing json result from the Azure CLI: Error retrieving running Azure CLI: --- Logging error ---
Traceback (most recent call last):
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/__main__.py", line 65, in <module>
    raise ex
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/__main__.py", line 51, in <module>
    sys.exit(exit_code)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/logging/handlers.py", line 70, in emit
    self.doRollover()
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/logging/handlers.py", line 171, in doRollover
    self.rotate(self.baseFilename, dfn)
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/logging/handlers.py", line 111, in rotate
    os.rename(source, dest)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/bowbaq/.azure/telemetry/cache' -> '/Users/bowbaq/.azure/telemetry/cache.1'
Call stack:
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/__main__.py", line 68, in <module>
    telemetry.conclude()
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/core/telemetry.py", line 209, in _wrapper
    return func(*args, **kwargs)
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/core/decorators.py", line 74, in _wrapped_func
    return func(*args, **kwargs)
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/core/telemetry.py", line 245, in conclude
    save(get_config_dir(), _session.generate_payload())
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/telemetry/__init__.py", line 64, in save
    if save_payload(config_dir, payload) and should_upload(config_dir):
  File "/usr/local/Cellar/azure-cli/2.2.0_1/libexec/lib/python3.8/site-packages/azure/cli/telemetry/util.py", line 53, in save_payload
    cache_saver.info(payload)
Message: '{"c4395b75-49cc-422c-bc95-c7d51aef5d46":[{"name":"azurecli/command","properties":{"Reserved.DataModel.EntityType":"UserTask","Reserved.DataModel.Action.Type":"Atomic","Reserved.DataModel.Action.Result":"Success","Reserved.ChannelUsed":"AI","Reserved.EventId":"d90073b3-2a82-4bc8-92de-4996496c8a2e","Reserved.SequenceNumber":1,"Reserved.SessionId":"07e3d26b-106b-4b3c-a9fe-ecae8ba38ed3","Reserved.TimeSinceSessionStart":0,"Reserved.DataModel.Source":"DataModelAPI","Reserved.DataModel.EntitySchemaVersion":4,"Reserved.DataModel.Severity":0,"Reserved.DataModel.ProductName":"azurecli","Reserved.DataModel.FeatureName":"command","Reserved.DataModel.EntityName":"ad-signedinuser-show","Reserved.DataModel.CorrelationId":"e8ee6998-0277-469d-8f0a-a857ed8b1eea","Context.Default.VS.Core.ExeName":"azurecli","Context.Default.VS.Core.ExeVersion":"2.2.0@none","Context.Default.VS.Core.MacAddressHash":"14afe0fddf3ff3bafc847a517fcf1a79b2a13f5846c2b512096591088b268842","Context.Default.VS.Core.Machine.Id":"14afe0fd-df3f-f3ba-fc84-7a517fcf1a79","Context.Default.VS.Core.OS.Type":"darwin","Context.Default.VS.Core.OS.Version":"darwin kernel version 17.7.0: thu jun 21 22:53:14 pdt 2018; root:xnu-4570.71.2~1/release_x86_64","Context.Default.VS.Core.User.Id":"af48c512-7533-11ea-85aa-6a00035843e0","Context.Default.VS.Core.User.IsMicrosoftInternal":"False","Context.Default.VS.Core.User.IsOptedIn":"True","Context.Default.VS.Core.TelemetryApi.ProductVersion":"azurecli@2.2.0","Context.Default.AzureCLI.Source":"az","Context.Default.AzureCLI.ClientRequestId":"ed9de024-7adb-11ea-8273-6a00035843e0","Context.Default.AzureCLI.CoreVersion":"2.2.0","Context.Default.AzureCLI.TelemetryVersion":"2.0","Context.Default.AzureCLI.InstallationId":"af48c512-7533-11ea-85aa-6a00035843e0","Context.Default.AzureCLI.ShellType":"/bin/zsh","Context.Default.AzureCLI.UserAzureId":"f456f1053cb1b2009099bbdce8868da8d71a64f6e44b7392918ab97afa5f8a66","Context.Default.AzureCLI.UserAzureSubscriptionId":"ead18a99-b39b-4779-8d18-a8a69825a7c6","Context.Default.AzureCLI.DefaultOutputType":"unknown","Context.Default.AzureCLI.EnvironmentVariables":"[]","Context.Default.AzureCLI.Locale":"en_US,UTF-8","Context.Default.AzureCLI.StartTime":"2020-04-10 03:32:39.088671","Context.Default.AzureCLI.EndTime":"2020-04-10 03:32:39.926151","Context.Default.AzureCLI.OutputType":"json","Context.Default.AzureCLI.RawCommand":"ad signed-in-user show","Context.Default.AzureCLI.Params":"-o","Context.Default.AzureCLI.PythonVersion":"3.8.2","Context.Default.AzureCLI.Mode":"default"}}]}'
Arguments: ()

  on main.tf line 8, in provider "azurerm":
   8: provider "azurerm" {

To Reproduce:

I unfortunately don't have good reproduction steps. This seemingly happens randomly when using terraform apply, terraform plan or terraform refresh. The error usually goes away when re-running the same command, but not always, it can take a few retries.

We have the following configuration for the azurerm provider:

provider "azurerm" {
  version = "~> 1.44"

  subscription_id = var.azure_subscription_id
}

Expected Behavior

I would expect az not to crash during normal operations

Environment Summary

macOS-10.13.6-x86_64-i386-64bit
Python 3.8.2

azure-cli 2.2.0 *
yonzhan commented 4 years ago

add to S168

jiasli commented 4 years ago

According to the error reported by Azure CLI:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/bowbaq/.azure/telemetry/cache' -> '/Users/bowbaq/.azure/telemetry/cache.1'

I suspect this is caused by a racing case during parallel execution when .azure/telemetry/cache is not locked. Please check if #1347, #9427 can solve the issue.

We are planning to use portalocker to mitigate such scenarios. +@fengzhou-msft @haroldrandom

Bowbaq commented 4 years ago

I've certainly run into #9427 several times.

I'm not really sure how file locking is going to help here. It seems to be it's more of a design issue. Separate processes shouldn't really be logging to the same file, they should probably each have their own (name after the PID or something like that).

As for the access tokens, it appears to be essentially the same problem of several processes clobbering the file by updating it concurrently. This can be mitigated by storing each access token in its own file (maybe named after the expiration timestamp?). Newly started processes can read all existing tokens in the folder. Any new token goes into its own file. Expired tokens can be garbage collected periodically.

brandonh-msft commented 4 years ago

seems like a good argument for a telemetry opt out variable like the .net and Functions CLI both have...

jiasli commented 4 years ago

Azure CLI does have an option to disable telemetry. Please see CLI configuration values and environment variables and use collect_telemetry to disable telemetry.

For concurrent build, please see https://github.com/Azure/azure-cli/blob/dev/doc/use_cli_effectively.md#concurrent-builds

chriswue commented 3 years ago

FWIW - I just ran into the telemetry cache issue when trying to run az cli with gnu parallel to do some concurrent work. Disabling telemetry seems to have resolved it. Any updates on this issue?

jiasli commented 3 years ago

We currently doesn't have enough bandwidth implementing parallel/concurrent execution. Please see Concurrent builds if you want to run Azure CLI instances concurrently.