Azure / azure-cli

Azure Command-Line Interface
MIT License
3.96k stars 2.94k forks source link

Loading the mysql command module adding 5+ seconds to almost all commands. #29703

Closed nberkley-gp closed 1 month ago

nberkley-gp commented 1 month ago

Describe the bug

Many commands (az --version, az account get-access-token, etc.) suddenly seem to just pause for ~5 seconds before doing anything.

Related command

az --help az --version az account get-access-token

Errors

No errors.

Issue script & Debug output

az --help --debug cli.knack.cli: Command arguments: ['--help', '--debug'] cli.knack.cli: init debug log: Enable color in terminal. cli.knack.cli: Event: Cli.PreExecute [] cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x1026a72e0>, <function OutputProducer.on_global_arguments at 0x102a08680>, <function CLIQuery.on_global_arguments at 0x102a2a160>] cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate [] cli.azure.cli.core: No module found from index for '['--help', '--debug']' cli.azure.cli.core: Loading all modules and extensions cli.azure.cli.core: Discovered command modules: ['acr', 'acs', 'advisor', 'ams', 'apim', 'appconfig', 'appservice', 'aro', 'backup', 'batch', 'batchai', 'billing', 'botservice', 'cdn', 'cloud', 'cognitiveservices', 'compute_recommender', 'config', 'configure', 'consumption', 'container', 'containerapp', 'cosmosdb', 'databoxedge', 'dla', 'dls', 'dms', 'eventgrid', 'eventhubs', 'extension', 'feedback', 'find', 'hdinsight', 'identity', 'interactive', 'iot', 'keyvault', 'kusto', 'lab', 'managedservices', 'maps', 'marketplaceordering', 'monitor', 'mysql', 'netappfiles', 'network', 'policyinsights', 'privatedns', 'profile', 'rdbms', 'redis', 'relay', 'resource', 'role', 'search', 'security', 'servicebus', 'serviceconnector', 'servicefabric', 'signalr', 'sql', 'sqlvm', 'storage', 'synapse', 'util', 'vm'] cli.azure.cli.core: Loading command modules: cli.azure.cli.core: Name Load Time Groups Commands cli.azure.cli.core: acr 0.054 36 149 cli.azure.cli.core: acs 0.007 14 76 cli.azure.cli.core: advisor 0.000 3 6 cli.azure.cli.core: ams 0.002 22 100 cli.azure.cli.core: apim 0.002 14 69 cli.azure.cli.core: appconfig 0.001 9 47 cli.azure.cli.core: appservice 0.016 79 270 cli.azure.cli.core: aro 0.003 1 10 cli.azure.cli.core: backup 0.008 16 60 cli.azure.cli.core: batch 0.009 34 102 cli.azure.cli.core: batchai 0.001 10 30 cli.azure.cli.core: billing 0.004 20 53 cli.azure.cli.core: botservice 0.001 12 42 cli.azure.cli.core: cdn 0.036 8 50 cli.azure.cli.core: cloud 0.000 1 7 cli.azure.cli.core: cognitiveservices 0.001 10 33 cli.azure.cli.core: compute_recommender 0.001 1 1 cli.azure.cli.core: config 0.000 2 7 cli.azure.cli.core: configure 0.000 2 5 cli.azure.cli.core: consumption 0.005 8 9 cli.azure.cli.core: container 0.002 1 11 cli.azure.cli.core: containerapp 0.030 37 123 cli.azure.cli.core: cosmosdb 0.005 58 199 cli.azure.cli.core: databoxedge 0.002 5 28 cli.azure.cli.core: dla 0.001 23 62 cli.azure.cli.core: dls 0.001 7 41 cli.azure.cli.core: dms 0.001 3 22 cli.azure.cli.core: eventgrid 0.002 25 96 cli.azure.cli.core: eventhubs 0.003 13 19 cli.azure.cli.core: extension 0.000 1 7 cli.azure.cli.core: feedback 0.000 1 2 cli.azure.cli.core: find 0.000 1 1 cli.azure.cli.core: hdinsight 0.002 8 39 cli.azure.cli.core: identity 0.001 2 11 cli.azure.cli.core: interactive 0.000 1 1 cli.azure.cli.core: iot 0.044 19 82 cli.azure.cli.core: keyvault 0.003 20 113 cli.azure.cli.core: kusto 0.001 3 14 cli.azure.cli.core: lab 0.001 11 34 cli.azure.cli.core: managedservices 0.000 3 8 cli.azure.cli.core: maps 0.000 5 13 cli.azure.cli.core: marketplaceordering 0.001 1 2 cli.azure.cli.core: monitor 0.150 17 61 cli.azure.cli.core: mysql 5.062 15 53 cli.azure.cli.core: netappfiles 0.014 8 17 cli.azure.cli.core: network 0.040 103 338 cli.azure.cli.core: policyinsights 0.004 9 17 cli.azure.cli.core: privatedns 0.005 14 60 cli.azure.cli.core: profile 0.000 2 8 cli.azure.cli.core: rdbms 0.006 49 202 cli.azure.cli.core: redis 0.001 7 38 cli.azure.cli.core: relay 0.038 7 8 cli.azure.cli.core: resource 0.004 51 231 cli.azure.cli.core: role 0.001 17 61 cli.azure.cli.core: search 0.002 7 19 cli.azure.cli.core: security 0.003 48 98 cli.azure.cli.core: servicebus 0.002 12 14 cli.azure.cli.core: serviceconnector 0.023 20 309 cli.azure.cli.core: servicefabric 0.004 27 80 cli.azure.cli.core: signalr 0.001 9 34 cli.azure.cli.core: sql 0.005 56 215 cli.azure.cli.core: sqlvm 0.008 4 20 cli.azure.cli.core: storage 0.015 59 273 cli.azure.cli.core: synapse 0.004 54 246 cli.azure.cli.core: util 0.000 3 7 cli.azure.cli.core: vm 0.016 58 233 cli.azure.cli.core: Total (66) 5.659 1206 4696 cli.azure.cli.core: Loading extensions: cli.azure.cli.core: Name Load Time Groups Commands Directory cli.knack.cli: Event: CommandLoader.OnLoadCommandTable [] cli.azure.cli.core: aks-preview 0.003 21 95 /Users/nathanberkley/.azure/cliextensions/aks-preview cli.azure.cli.core: Total (1) 0.003 21 95
cli.azure.cli.core: Loaded 1201 groups, 4726 commands. cli.azure.cli.core: Updated command index in 0.003 seconds. cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x103dfe520>] cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to '/Users/nathanberkley/.azure/commands/2024-08-09.17-14-22.unknown_command.27408.log'. az_command_data_logger: command args: --help --debug cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument..add_subscription_parameter at 0x103e2bf60>] cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument..add_ids_arguments at 0x103e4f560>, <function register_cache_arguments..add_cache_arguments at 0x103e4f6a0>] cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [<function _documentdb_deprecate at 0x1071c2a20>]

Group az

Subgroups: account : Manage Azure subscription information. acr : Manage private registries with Azure Container Registries. ad : Manage Microsoft Entra ID (formerly known as Azure Active Directory, Azure AD, AAD) entities needed for Azure role-based access control (Azure RBAC) through Microsoft Graph API. advisor : Manage Azure Advisor. afd : Manage Azure Front Door Standard/Premium. aks : Manage Azure Kubernetes Services. ams : Manage Azure Media Services resources. apim : Manage Azure API Management services. appconfig : Manage App Configurations. appservice : Manage App Service plans. aro : Manage Azure Red Hat OpenShift clusters. backup : Manage Azure Backups. batch : Manage Azure Batch. bicep : Bicep CLI command group. billing : Manage Azure Billing. bot : Manage Microsoft Azure Bot Service. cache : Commands to manage CLI objects cached using the --defer argument. capacity : Manage capacity. cdn : Manage Azure Content Delivery Networks (CDNs). cloud : Manage registered Azure clouds. cognitiveservices : Manage Azure Cognitive Services accounts. compute-recommender : Manage sku/zone/region recommender info for compute resources. config [Experimental] : Manage Azure CLI configuration. connection : Commands to manage Service Connector local connections which allow local environment to connect Azure Resource. If you want to manage connection for compute service, please run 'az webapp/containerapp/spring connection'. consumption [Preview] : Manage consumption of Azure resources. container : Manage Azure Container Instances. containerapp : Manage Azure Container Apps. cosmosdb : Manage Azure Cosmos DB database accounts. databoxedge [Preview] : Manage device with databoxedge. deployment : Manage Azure Resource Manager template deployment at subscription scope. deployment-scripts : Manage deployment scripts at subscription or resource group scope. disk : Manage Azure Managed Disks. disk-access : Manage disk access resources. disk-encryption-set : Disk Encryption Set resource. dla [Preview] : Manage Data Lake Analytics accounts, jobs, and catalogs. dls [Preview] : Manage Data Lake Store accounts and filesystems. dms : Manage Azure Data Migration Service (classic) instances. eventgrid : Manage Azure Event Grid topics, domains, domain topics, system topics partner topics, event subscriptions, system topic event subscriptions and partner topic event subscriptions. eventhubs : Eventhubs. extension : Manage and update CLI extensions. feature : Manage resource provider features. functionapp : Manage function apps. To install the Azure Functions Core tools see https://github.com/Azure/azure-functions-core-tools. group : Manage resource groups and template deployments. hdinsight : Manage HDInsight resources. identity : Managed Identities. image : Manage custom virtual machine images. iot : Manage Internet of Things (IoT) assets. keyvault : Manage KeyVault keys, secrets, and certificates. kusto : Manage Azure Kusto resources. lab [Preview] : Manage Azure DevTest Labs. lock : Manage Azure locks. logicapp : Manage logic apps. managed-cassandra : Azure Managed Cassandra. managedapp : Manage template solutions provided and maintained by Independent Software Vendors (ISVs). managedservices : Manage the registration assignments and definitions in Azure. maps : Manage Azure Maps. mariadb : Manage Azure Database for MariaDB servers. monitor : Manage the Azure Monitor Service. mysql : Manage Azure Database for MySQL servers. netappfiles : Manage Azure NetApp Files (ANF) Resources. network : Manage Azure Network resources. policy : Manage resource policies. postgres : Manage Azure Database for PostgreSQL servers. ppg : Manage Proximity Placement Groups. private-link : Private-link association CLI command group. provider : Manage resource providers. redis : Manage dedicated Redis caches for your Azure applications. relay : Manage Azure Relay Service namespaces, WCF relays, hybrid connections, and rules. resource : Manage Azure resources. resourcemanagement : Resourcemanagement CLI command group. restore-point : Manage restore point with res. role : Manage Azure role-based access control (Azure RBAC). search : Manage Azure Search services, admin keys and query keys. security : Manage your security posture with Microsoft Defender for Cloud. servicebus : Servicebus. sf : Manage and administer Azure Service Fabric clusters. sig : Manage shared image gallery. signalr : Manage Azure SignalR Service. snapshot : Manage point-in-time copies of managed disks, native blobs, or other snapshots. sql : Manage Azure SQL Databases and Data Warehouses. sshkey : Manage ssh public key with vm. stack : A deployment stack is a native Azure resource type that enables you to perform operations on a resource collection as an atomic unit. staticwebapp : Manage static apps. storage : Manage Azure Cloud Storage resources. synapse : Manage and operate Synapse Workspace, Spark Pool, SQL Pool. tag : Tag Management on a resource. term [Experimental] : Manage marketplace agreement with marketplaceordering. ts : Manage template specs at subscription or resource group scope. vm : Manage Linux or Windows virtual machines. vmss : Manage groupings of virtual machines in an Azure Virtual Machine Scale Set (VMSS). webapp : Manage web apps.

Commands: configure : Manage Azure CLI configuration. This command is interactive. feedback : Send feedback to the Azure CLI Team. find : I'm an AI robot, my advice is based on our Azure documentation as well as the usage patterns of Azure CLI and Azure ARM users. Using me improves Azure products and documentation. interactive [Preview] : Start interactive mode. Installs the Interactive extension if not installed already. login : Log in to Azure. logout : Log out to remove access to Azure subscriptions. rest : Invoke a custom request. survey : Take Azure CLI survey. upgrade [Preview] : Upgrade Azure CLI and extensions. version : Show the versions of Azure CLI modules and extensions in JSON format by default or format configured by --output.

To search AI knowledge base for examples, use: az find "az "

cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x103dfe7a0>] az_command_data_logger: exit code: 0 cli.main: Command ran in 5.797 seconds (init: 0.073, invoke: 5.724) telemetry.main: Begin splitting cli events and extra events, total events: 1 telemetry.client: Accumulated 0 events. Flush the clients. telemetry.main: Finish splitting cli events and extra events, cli events: 1 telemetry.save: Save telemetry record of length 3676 in cache telemetry.main: Begin creating telemetry upload process. telemetry.process: Creating upload process: "/opt/homebrew/Cellar/azure-cli/2.63.0/libexec/bin/python /opt/homebrew/Cellar/azure-cli/2.63.0/libexec/lib/python3.11/site-packages/azure/cli/telemetry/init.py /Users/nathanberkley/.azure" telemetry.process: Return from creating process telemetry.main: Finish creating telemetry upload process.

Expected behavior

Sub-second return for simple commands.

Environment Summary

azure-cli 2.63.0

core 2.63.0 telemetry 1.1.0

Extensions: aks-preview 7.0.0b3

Dependencies: msal 1.30.0 azure-mgmt-resource 23.1.1

Python location '/opt/homebrew/Cellar/azure-cli/2.63.0/libexec/bin/python' Extensions directory '/Users/nathanberkley/.azure/cliextensions'

Python (Darwin) 3.11.9 (main, Apr 2 2024, 08:25:04) [Clang 15.0.0 (clang-1500.3.9.4)]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Additional context

No response

yonzhan commented 1 month ago

Thank you for opening this issue, we will look into it.

microsoft-github-policy-service[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @AzureAppServiceCLI, @antcp.

nberkley-gp commented 1 month ago

After spending a bit more time looking into this it's not just mysql. I'd used --help as the debug case on the assumption that it'd be the simplest case, but different commands hang for a very similar timeframe in different places. For instance the az account get-access-token command that originally sent me down this rabbit hole looks like this:

cli.knack.cli: Command arguments: ['account', 'get-access-token', '-o', 'json', '--resource', 'XXXX', '--debug']
cli.knack.cli: __init__ debug log:
Enable color in terminal.
cli.knack.cli: Event: Cli.PreExecute []
cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x1051032e0>, <function OutputProducer.on_global_arguments at 0x105464680>, <function CLIQuery.on_global_arguments at 0x105486160>]
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
cli.azure.cli.core: Modules found from index for 'account': ['azure.cli.command_modules.profile', 'azure.cli.command_modules.resource']
cli.azure.cli.core: Loading command modules:
cli.azure.cli.core: Name                  Load Time    Groups  Commands
cli.azure.cli.core: profile                   0.001         2         8
cli.azure.cli.core: resource                  0.054        51       231
cli.azure.cli.core: Total (2)                 0.055        53       239
cli.azure.cli.core: Loaded 52 groups, 239 commands.
cli.azure.cli.core: Found a match in the command table.
cli.azure.cli.core: Raw command  : account get-access-token
cli.azure.cli.core: Command table: account get-access-token
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x1066f2520>]
cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to '/Users/nathanberkley/.azure/commands/2024-08-09.17-57-26.account_get-access-token.28948.log'.
az_command_data_logger: command args: account get-access-token -o {} --resource {} --debug
cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.<locals>.add_subscription_parameter at 0x10671ff60>]
cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad []
cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument.<locals>.add_ids_arguments at 0x106743560>, <function register_cache_arguments.<locals>.add_cache_arguments at 0x1067436a0>]
cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded []
cli.knack.cli: Event: CommandInvoker.OnPreParseArgs []
cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x105464720>, <function CLIQuery.handle_query_parameter at 0x105486200>, <function register_ids_argument.<locals>.parse_ids_arguments at 0x106743600>]

# Long pause happens here

cli.azure.cli.core.auth.persistence: build_persistence: location='/Users/nathanberkley/.azure/msal_token_cache.json', encrypt=False
cli.azure.cli.core.auth.binary_cache: load: /Users/nathanberkley/.azure/msal_http_cache.bin
urllib3.util.retry: Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redirect=None, status=None)
msal.authority: Initializing with Entra authority: https://login.microsoftonline.com/XXXX
msal.authority: openid_config("https://login.microsoftonline.com/XXXX/v2.0/.well-known/openid-configuration") = {'token_endpoint': 'https://login.microsoftonline.com/XXXX/oauth2/v2.0/token', 'token_endpoint_auth_methods_supported': ['client_secret_post', 'private_key_jwt', 'client_secret_basic'], 'jwks_uri': 'https://login.microsoftonline.com/XXXX/discovery/v2.0/keys', 'response_modes_supported': ['query', 'fragment', 'form_post'], 'subject_types_supported': ['pairwise'], 'id_token_signing_alg_values_supported': ['RS256'], 'response_types_supported': ['code', 'id_token', 'code id_token', 'id_token token'], 'scopes_supported': ['openid', 'profile', 'email', 'offline_access'], 'issuer': 'https://login.microsoftonline.com/XXXX/v2.0', 'request_uri_parameter_supported': False, 'userinfo_endpoint': 'https://graph.microsoft.com/oidc/userinfo', 'authorization_endpoint': 'https://login.microsoftonline.com/XXXX/oauth2/v2.0/authorize', 'device_authorization_endpoint': 'https://login.microsoftonline.com/XXXX/oauth2/v2.0/devicecode', 'http_logout_supported': True, 'frontchannel_logout_supported': True, 'end_session_endpoint': 'https://login.microsoftonline.com/XXXX/oauth2/v2.0/logout', 'claims_supported': ['sub', 'iss', 'cloud_instance_name', 'cloud_instance_host_name', 'cloud_graph_host_name', 'msgraph_host', 'aud', 'exp', 'iat', 'auth_time', 'acr', 'nonce', 'preferred_username', 'name', 'tid', 'ver', 'at_hash', 'c_hash', 'email'], 'kerberos_endpoint': 'https://login.microsoftonline.com/XXXX/kerberos', 'tenant_region_scope': 'NA', 'cloud_instance_name': 'microsoftonline.com', 'cloud_graph_host_name': 'graph.windows.net', 'msgraph_host': 'graph.microsoft.com', 'rbac_url': 'https://pas.windows.net'}
msal.application: Broker enabled? None
cli.azure.cli.core.auth.msal_authentication: UserCredential.get_token: scopes=('XXXX/.default',), claims=None, kwargs={}
msal.application: Cache hit an AT
msal.telemetry: Generate or reuse correlation_id: 9b849f6c-5a4d-42f4-a1e3-47ca20856a79
cli.knack.cli: Event: CommandInvoker.OnTransformResult [<function _resource_group_transform at 0x106740360>, <function _x509_from_base64_to_hex_transform at 0x106740400>]
cli.knack.cli: Event: CommandInvoker.OnFilterResult []
{
  "accessToken": "XXXX",
  "expiresOn": "2024-08-09 18:49:03.000000",
  "expires_on": 1723243743,
  "subscription": "XXXX",
  "tenant": "XXXX",
  "tokenType": "Bearer"
}
cli.knack.cli: Event: Cli.SuccessfulExecute []
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x1066f27a0>]
az_command_data_logger: exit code: 0
cli.__main__: Command ran in 5.140 seconds (init: 0.065, invoke: 5.075)
jiasli commented 1 month ago

I am not sure if this is specific to macOS, but I am not unable repro on Ubuntu and Windows.

In your first debug log (https://github.com/Azure/azure-cli/issues/29703#issue-2458681549), mysql takes 5s to load:

$ az --help --debug 2>&1 | ts '%H:%M:%.S'
...
cli.azure.cli.core: mysql 5.062 15 53

However, on my machine, mysql only takes 0.065s to load:

$ az --help --debug 2>&1 | ts '%H:%M:%.S'
...
11:55:17.817834 DEBUG: cli.azure.cli.core: mysql                     0.065        15        53
11:55:17.839212 DEBUG: cli.azure.cli.core: netappfiles               0.021         8        17

In your second debug log (https://github.com/Azure/azure-cli/issues/29703#issuecomment-2278808124), you can see the command index takes effect, thus only profile and resource modules are loaded.

In my test, no pause is observed:

$ az account get-access-token --debug 2>&1 | ts '%H:%M:%.S'
...
11:58:30.721081 DEBUG: cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x7fbed684e200>, <function CLIQuery.handle_query_parameter at 0x7fbed688bce0>, <function register_ids_argument.<locals>.parse_ids_arguments at 0x7fbed5b572e0>]
11:58:30.743791 DEBUG: cli.azure.cli.core.auth.persistence: build_persistence: location='/home/user2/.azure/msal_token_cache.json', encrypt=False

On Windows, even though it is a little bit slow, mysql only takes 1.442s to load.

> az --debug
...
cli.azure.cli.core: mysql                     1.442        15        53

az account get-access-token takes 2s to complete:

> Measure-Command {az account get-access-token --debug}
...
TotalSeconds      : 2.0845171

I would recommend testing it multiple times and see if you can have a steady repro. Also, please check if you have a high CPU usage.

Let's keep monitoring and see if other people are seeing the same issue.

nberkley-gp commented 1 month ago

I would recommend testing it multiple times and see if you can have a steady repro. Also, please check if you have a high CPU usage.

So I went and ran az --help in a loop and collected timings. The timing is extremely consistent. Across 20 runs they took a mean of 5.47 seconds with a maximum deviation of 50ms in either direction. CPU was over 90% idle over the entire test with no run taking more than 240ms of CPU time.

nberkley-gp commented 1 month ago

Ok, so after some more digging around I managed to get a trace pointing the finger at socket.getfqdn(). It seems that somewhere in the OS upgrade I did around the same time something changed with either /etc/hosts or the name returned by socket.gethostname(), with the end result being that the call to socket.gethostbyaddr(name) was hitting the DNS timeout (thus the tight timing distribution).

I'm still not sure why this was showing up as a delay/timeout when other codebases (nslookup, dig) would get an immediate NXDOMAIN when attempting the same query, or what exactly changed to make this suddenly show up now. That said it's easily solved with an /etc/hosts entry so I'm going to set it aside for now.

I'll go ahead and close this since it seems to be a mac/python issue rather than an azure-cli issue. Sorry for the false alarm.