Azure / azure-cli

Azure Command-Line Interface
MIT License
3.97k stars 2.95k forks source link

azure ml extension slow in WSL2 when .azure directory is not on WSL2 filesystem / azure cli on WSL2 installed .azure as symlink #27486

Open jlester-msft opened 11 months ago

jlester-msft commented 11 months ago

Describe the bug

This is a combination of two problems:

  1. In some past version of the Azure CLI on WSL2, the ~/.azure directory was installed as a symlink to the user's Windows profile directory. That is, ~/.azure -> /mnt/c/Users/USERNAME/.azure. Besides causing the WSL2 and Windows environment to share a .azure directory this also causes slow disk access to the .azure directory due to WSL2.

  2. When the ~/.azure directory is not on the WSL2 filesystem the Azure ML extension takes 60s - 80s on nearly every az cli call, this makes the azure CLI pretty unusable. The Azure ML extension seems to be spending most, >95%, of its time in frozen importlib._bootstrap methods loading the ml extension:

cli.azure.cli.core: Loading extensions:
cli.azure.cli.core: Name                  Load Time    Groups  Commands  Directory
cli.azure.cli.core: account                   0.420         4        13  /home/tegra/.azure/cliextensions/account
cli.azure.cli.core: init                      0.177         1         1  /home/tegra/.azure/cliextensions/init
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
cli.azure.cli.core: ml                       88.436        21       145  /home/tegra/.azure/cliextensions/ml
cli.knack.cli: Event: CommandLoader.OnLoadCommandTable []
cli.azure.cli.core: storage-preview           1.006        16        52  /home/tegra/.azure/cliextensions/storage-preview
cli.azure.cli.core: Total (4)                90.039        42       211

Suggested resolutions:

  1. If having the ~/.azure directory on a non-WSL filesystem is not a good idea, the Azure CLI should warn users during updates or at some other point. The reinstallation process is simple, just remove the symlink, run the azure cli again to login, and re-install any extensions you need. Some previous version of the Azure CLI seems to have created this symlink as this behavior has been observed on older machines but not recently rebuilt machines. So some number of azure cli users might unknowingly have their profile on the 'wrong' filesystem.

  2. The Azure ML extension should figure out why initialization is taking so long/not a one-time event. If this is not avoidable then there should probably be a warning about slow performance when the ~/.azure is not on a WSL2 filesystem as a one minute+ runtime for every azure ml command is unusable.

Related command

az ml

Errors

See attached trace of a slow az ml --verbose --debug run and a normal az ml --verbose --debug run.

These .prof files were created by running /opt/az/bin/python3 -m cProfile -o profile_normal_az_cli_ml.prof -m azure.cli ml --debug --verbose you can visualize the trace by installing snakeviz pip3 install snakeviz and then running (in a new terminal) snakeviz profile_file_name.prof

Slow profile:

image profile_slow_az_cli_ml.zip

Normal profile

image profile_normal_az_cli_ml.zip

Issue script & Debug output

Example of slow Azure ML extension startup (~/.azure off WSL2 filesystem)

Example output when ~/.azure is a symlink to /mnt/c/Users/USERNAME/.azure. You can replicate this setup on a new Windows machine by:

  1. Spinning up a VM with Hyper-V enabled in Azure
  2. Installing WSL2 + Ubuntu (20.04.6 LTS in this case)
  3. Install Azure CLI: curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  4. Install Azure ML Extension: az extension add --name ml
  5. Move ~/.azure directory and create symlink: mv ~/.azure/ /mnt/c/Users/USERNAME/ && ln -s /mnt/c/Users/USERNAME/.azure ~/.azure
$ ls -al ~/.azure
lrwxrwxrwx 1 tegra tegra 27 Sep 27 20:26 /home/tegra/.azure -> /mnt/c/Users/jlester/.azure

$ az ml --debug --verbose
cli.knack.cli: Command arguments: ['ml', '--debug', '--verbose']
cli.knack.cli: __init__ debug log:
Enable color in terminal.
cli.knack.cli: Event: Cli.PreExecute []
cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x7f7f8a6bd360>, <function OutputProducer.on_global_arguments at 0x7f7f8a614280>, <function CLIQuery.on_global_arguments at 0x7f7f8a431480>]
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
cli.azure.cli.core: Modules found from index for 'ml': ['azext_mlv2']
cli.azure.cli.core: Loading command modules:
cli.azure.cli.core: Name                  Load Time    Groups  Commands
cli.azure.cli.core: Total (0)                 0.000         0         0
cli.azure.cli.core: These extensions are not installed and will be skipped: ['azext_ai_examples', 'azext_next']
cli.azure.cli.core: Loading extensions:
cli.azure.cli.core: Name                  Load Time    Groups  Commands  Directory
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
cli.azure.cli.core: ml                       84.901        21       145  /home/tegra/.azure/cliextensions/ml
cli.azure.cli.core: Total (1)                84.901        21       145
cli.azure.cli.core: Loaded 21 groups, 145 commands.
cli.azure.cli.core: Found a match in the command group table for 'ml'.
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x7f7f897faf80>]
cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to '/home/tegra/.azure/commands/2023-09-27.20-24-08.ml.14077.log'.
az_command_data_logger: command args: ml --debug --verbose
cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.<locals>.add_subscription_parameter at 0x7f7f8981bc70>]
cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad []
cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument.<locals>.add_ids_arguments at 0x7f7f89861990>, <function register_cache_arguments.<locals>.add_cache_arguments at 0x7f7f89861ab0>]
cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded []
cli.knack.cli: Event: CommandInvoker.OnPreParseArgs []
urllib3.connectionpool: Starting new HTTPS connection (1): app.aladdin.microsoft.com:443
urllib3.connectionpool: https://app.aladdin.microsoft.com:443 "GET /api/v1.0/suggestions?query=%7B%22command%22%3A+%22ml%22%2C+%22parameters%22%3A+%22%22%7D&clientType=AzureCli&context=%7B%22versionNumber%22%3A+%222.53.0%22%2C+%22errorType%22%3A+%22MissingRequiredSubcommand%22%2C+%22correlationId%22%3A+%22ecfad59b-3bcc-401a-8782-2b1eb88bc8e5%22%2C+%22eventId%22%3A+%22fd587933-0526-48f1-8730-df043b0e34e0%22%7D HTTP/1.1" 200 None
cli.azure.cli.core.azclierror: NoneType: None

cli.azure.cli.core.azclierror: the following arguments are required: _subcommand
az_command_data_logger: the following arguments are required: _subcommand

Examples from AI knowledge base:
https://aka.ms/cli_ref
Read more about the command in reference docs
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x7f7f897fb1c0>]
az_command_data_logger: exit code: 2
cli.__main__: Command ran in 85.304 seconds (init: 0.205, invoke: 85.100)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 3432 in cache
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "/usr/bin/../../opt/az/bin/python3 /opt/az/lib/python3.10/site-packages/azure/cli/telemetry/__init__.py /home/tegra/.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.

Example of normal Azure ML extension startup (~/.azure on WSL2 filesystem)

You can revert to a testable state by removing the ~/.azure symlink and reinstalling the Azure ml extension:

  1. rm ~/.azure
  2. az extension add --name ml
$ ls -ald ~/.azure
drwxr-xr-x 6 tegra tegra 4096 Sep 27 20:43 /home/tegra/.azure

$ az ml --verbose --debug
cli.knack.cli: Command arguments: ['ml', '--verbose', '--debug']
cli.knack.cli: __init__ debug log:
Enable color in terminal.
cli.knack.cli: Event: Cli.PreExecute []
cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x7f9504d89360>, <function OutputProducer.on_global_arguments at 0x7f9504ce0280>, <function CLIQuery.on_global_arguments at 0x7f9504b01480>]
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
cli.azure.cli.core: Command index version or cloud profile is invalid or doesn't match the current command.
cli.azure.cli.core: Command index has been invalidated.
cli.azure.cli.core: No module found from index for '['ml', '--verbose', '--debug']'
cli.azure.cli.core: Loading all modules and extensions
cli.azure.cli.core: Discovered command modules: ['acr', 'acs', 'advisor', 'ams', 'apim', 'appconfig', 'appservice', 'aro', 'backup', 'batch', 'batchai', 'billing', 'botservice', 'cdn', 'cloud', 'cognitiveservices', 'config', 'configure', 'consumption', 'container', 'containerapp', 'cosmosdb', 'databoxedge', 'dla', 'dls', 'dms', 'eventgrid', 'eventhubs', 'extension', 'feedback', 'find', 'hdinsight', 'identity', 'interactive', 'iot', 'keyvault', 'kusto', 'lab', 'managedservices', 'maps', 'marketplaceordering', 'monitor', 'mysql', 'netappfiles', 'network', 'policyinsights', 'privatedns', 'profile', 'rdbms', 'redis', 'relay', 'resource', 'role', 'search', 'security', 'servicebus', 'serviceconnector', 'servicefabric', 'signalr', 'sql', 'sqlvm', 'storage', 'synapse', 'util', 'vm']
cli.azure.cli.core: Loading command modules:
cli.azure.cli.core: Name                  Load Time    Groups  Commands
cli.azure.cli.core: acr                       0.083        34       144
cli.azure.cli.core: acs                       0.012         7        54
cli.azure.cli.core: advisor                   0.001         3         6
cli.azure.cli.core: ams                       0.004        22       100
cli.azure.cli.core: apim                      0.003        14        68
cli.azure.cli.core: appconfig                 0.002         9        47
cli.azure.cli.core: appservice                0.046        73       260
cli.azure.cli.core: aro                       0.007         1        10
cli.azure.cli.core: backup                    0.005        16        59
cli.azure.cli.core: batch                     0.020        34       102
cli.azure.cli.core: batchai                   0.003        10        30
cli.azure.cli.core: billing                   0.004        19        52
cli.azure.cli.core: botservice                0.002        12        42
cli.azure.cli.core: cdn                       0.005        39       133
cli.azure.cli.core: cloud                     0.001         1         7
cli.azure.cli.core: cognitiveservices         0.003        10        33
cli.azure.cli.core: config                    0.001         2         7
cli.azure.cli.core: configure                 0.001         2         5
cli.azure.cli.core: consumption               0.009         8         9
cli.azure.cli.core: container                 0.007         1        11
cli.azure.cli.core: containerapp              0.060        36       115
cli.azure.cli.core: cosmosdb                  0.010        58       192
cli.azure.cli.core: databoxedge               0.003         5        27
cli.azure.cli.core: dla                       0.003        23        62
cli.azure.cli.core: dls                       0.002         7        41
cli.azure.cli.core: dms                       0.001         3        22
cli.azure.cli.core: eventgrid                 0.004        25        96
cli.azure.cli.core: eventhubs                 0.009        12        20
cli.azure.cli.core: extension                 0.001         1         7
cli.azure.cli.core: feedback                  0.001         1         2
cli.azure.cli.core: find                      0.001         1         1
cli.azure.cli.core: hdinsight                 0.004         8        39
cli.azure.cli.core: identity                  0.001         2        11
cli.azure.cli.core: interactive               0.000         1         1
cli.azure.cli.core: iot                       0.064        19        82
cli.azure.cli.core: keyvault                  0.007        22       133
cli.azure.cli.core: kusto                     0.001         3        14
cli.azure.cli.core: lab                       0.002        11        34
cli.azure.cli.core: managedservices           0.001         3         8
cli.azure.cli.core: maps                      0.001         5        13
cli.azure.cli.core: marketplaceordering       0.001         1         2
cli.azure.cli.core: monitor                   0.247        21        66
cli.azure.cli.core: mysql                     0.112        14        49
cli.azure.cli.core: netappfiles               0.006        17        96
cli.azure.cli.core: network                   0.075       103       336
cli.azure.cli.core: policyinsights            0.006         9        17
cli.azure.cli.core: privatedns                0.011        14        60
cli.azure.cli.core: profile                   0.001         2         8
cli.azure.cli.core: rdbms                     0.012        44       185
cli.azure.cli.core: redis                     0.001         5        27
cli.azure.cli.core: relay                     0.012         7         8
cli.azure.cli.core: resource                  0.009        51       227
cli.azure.cli.core: role                      0.002        17        61
cli.azure.cli.core: search                    0.001         7        22
cli.azure.cli.core: security                  0.005        48       104
cli.azure.cli.core: servicebus                0.016        12        17
cli.azure.cli.core: serviceconnector          0.028        12       182
cli.azure.cli.core: servicefabric             0.013        27        76
cli.azure.cli.core: signalr                   0.002         8        30
cli.azure.cli.core: sql                       0.012        56       215
cli.azure.cli.core: sqlvm                     0.011         4        20
cli.azure.cli.core: storage                   0.099        58       272
cli.azure.cli.core: synapse                   0.010        54       246
cli.azure.cli.core: util                      0.001         3         7
cli.azure.cli.core: vm                        0.035        57       230
cli.azure.cli.core: Total (65)                1.127      1214      4662
cli.azure.cli.core: Loading extensions:
cli.azure.cli.core: Name                  Load Time    Groups  Commands  Directory
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
azext_mlv2.manual.vendored_curated_sdk.azure.ai.ml._azure_environments: Using the default cloud configuration: 'AzureCloud'.
cli.azure.cli.core: ml                        1.335        21       145  /home/tegra/.azure/cliextensions/ml
cli.azure.cli.core: Total (1)                 1.335        21       145
cli.azure.cli.core: Loaded 1223 groups, 4807 commands.
cli.azure.cli.core: Updated command index in 0.004 seconds.
cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x7f9503ed2dd0>]
cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to '/home/tegra/.azure/commands/2023-09-27.20-44-43.ml.14585.log'.
az_command_data_logger: command args: ml --verbose --debug
cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.<locals>.add_subscription_parameter at 0x7f9503eef910>]
cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad []
cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument.<locals>.add_ids_arguments at 0x7f9503f2d750>, <function register_cache_arguments.<locals>.add_cache_arguments at 0x7f9503f2d870>]
cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded []
cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [<function _documentdb_deprecate at 0x7f95028f60e0>]
urllib3.connectionpool: Starting new HTTPS connection (1): app.aladdin.microsoft.com:443
urllib3.connectionpool: https://app.aladdin.microsoft.com:443 "GET /api/v1.0/suggestions?query=%7B%22command%22%3A+%22ml%22%2C+%22parameters%22%3A+%22%22%7D&clientType=AzureCli&context=%7B%22versionNumber%22%3A+%222.53.0%22%2C+%22errorType%22%3A+%22MissingRequiredSubcommand%22%2C+%22correlationId%22%3A+%229d551f3d-7a68-4243-aaf9-17dae916d18f%22%2C+%22eventId%22%3A+%22e8693d15-e84f-468d-bee2-55f763055260%22%7D HTTP/1.1" 200 None
cli.azure.cli.core.azclierror: NoneType: None

cli.azure.cli.core.azclierror: the following arguments are required: _subcommand
az_command_data_logger: the following arguments are required: _subcommand

Examples from AI knowledge base:
https://aka.ms/cli_ref
Read more about the command in reference docs
cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x7f9503ed3010>]
az_command_data_logger: exit code: 2
cli.__main__: Command ran in 2.721 seconds (init: 0.154, invoke: 2.567)
telemetry.main: Begin splitting cli events and extra events, total events: 1
telemetry.client: Accumulated 0 events. Flush the clients.
telemetry.main: Finish splitting cli events and extra events, cli events: 1
telemetry.save: Save telemetry record of length 3431 in cache
telemetry.main: Begin creating telemetry upload process.
telemetry.process: Creating upload process: "/usr/bin/../../opt/az/bin/python3 /opt/az/lib/python3.10/site-packages/azure/cli/telemetry/__init__.py /home/tegra/.azure"
telemetry.process: Return from creating process
telemetry.main: Finish creating telemetry upload process.

Expected behavior

Azure ml extension should not take a minute on every call.

Azure CLI should not place the ~/.azure directory on a non-WSL2 partition. While this might not be current behavior this seems to have been the behavior at some point in the past (older machines, >2 yrs) seem to have this directory as a symlink. Newer machines do not have this symlink setup on WSL2.

Environment Summary

az --version
azure-cli                         2.53.0

core                              2.53.0
telemetry                          1.1.0

Extensions:
ml                                2.20.0

Dependencies:
msal                            1.24.0b2
azure-mgmt-resource             23.1.0b2

Python location '/opt/az/bin/python3'
Extensions directory '/home/tegra/.azure/cliextensions'

Python (Linux) 3.10.10 (main, Sep 20 2023, 06:08:04) [GCC 9.4.0]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

WSL2 running Ubuntu 20.04.6 LTS.

>wsl --version
WSL version: 1.2.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.3448

Additional context

No response

yonzhan commented 11 months ago

Thank you for opening this issue, we will look into it.

microsoft-github-policy-service[bot] commented 11 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.