Azure / azure-cli

Azure Command-Line Interface
MIT License
3.98k stars 2.95k forks source link

"az ml workspace create" fails to create Application Insights and Container Registry #28980

Closed Rubikalubi closed 3 months ago

Rubikalubi commented 4 months ago

Describe the bug

When trying to create a new ml workspace, our deployment pipeline on devops fails because the workspace create command only creates the Storage Account and Keyvault. Then it tries to create the workspace, which obviously fails, since no Container Registry and Application Insights exists. We always update the CLI and ml extension to the latest version in our pipeline before executing any commands. I ran the same Code with ml extension 2.24.0 two months ago and it worked without an issue. Also running on my own machine with ml extension 2.22.0 the behaviour is as expected.

Related command

az ml workspace create

Errors

The deployment request ml* was accepted. ARM deployment URI for reference: URL REMOVED Creating Storage Account: (ml*** ) ... Done (21s) Creating Key Vault: (ml***** ) Done (18s) ERROR: Code: ValidationError Message: Missing dependent resources in workspace json Target: workspace Exception Details: (Invalid) Missing dependent resources in workspace json Code: Invalid Message: Missing dependent resources in workspace json Target: workspace

[error]Script failed with exit code: 1

/usr/bin/az account clear

This is the deployment on Azure Portal.

image

Issue script & Debug output

az ml workspace create --resource-group "${{ parameters.resourceGroup }}" --file $(Build.SourcesDirectory)/workspace/workspace.json

Contents of workspace.json

{ "$schema": "https://azuremlschemas.azureedge.net/latest/workspace.schema.json", "name": "ml", "display_name": "ml", "description": "created workspace ml in resource groupe ResGrp on 2024-05-16T09:44:05", "tags": { "createdOn": "2024-05-16T09:44:05", "createdBy": "" }, "location": "westeurope", "resource_group": "ResGrp", "hbi_workspace": false, }

I did not run the script with --debug because i would have to delete the ressources manually afterwards. If desired, please let me know.

Expected behavior

The command create all required ressources for a ml workspace (storage account, keyvault, container registry, appinsights) and then creates the workspace.

Environment Summary

/usr/bin/az --version azure-cli 2.60.0

core 2.60.0 telemetry 1.1.0

Extensions: azure-devops 1.0.0 ml 2.26.0

Dependencies: msal 1.28.0 azure-mgmt-resource 23.1.0b2

Additional context

No response

yonzhan commented 4 months ago

Thank you for opening this issue, we will look into it.

xec-abailey commented 4 months ago

There is something fishy here guys, I am experiencing this issue as well, just as @Rubikalubi is. Previously it would create any resource automatically if it isn't specified.

However, what we ended up trying to do is modify our creation script to first create an application insights resource (we already have a container registry, storage account, and key vault so it's not included in the creation script below) and then associate it using the following script:

acr_arm_id=$(az acr show --name ***** --query id -o tsv) 
insights_arm_id=$(az monitor app-insights component show --app ***** -g ***** --query "id" -o tsv)
az ml workspace create --resource-group **** --name ****** --storage-account ****  --key-vault ****** --container-registry "$acr_arm_id" -a "$insights_arm_id"

Considering the docs here I would expect this to work, however we are greeted with this

Code: ValidationError
Message: AppInsights ID is not in right format
Target: properties
Exception Details:      (Invalid) AppInsights ID is not in right format
        Code: Invalid
        Message: AppInsights ID is not in right format
        Target: properties

Looking at the AppInsights ID that is returned from the command directly we see it is compliant, or at least appears to be so given the above docs:

/subscriptions/*subscription*/resourceGroups/*resource-group*/providers/microsoft.insights/components/*name*

The name in this case is compliant with this doc so that shouldn't be an issue.

Version info

azure-cli                         2.61.0

core                              2.61.0
telemetry                          1.1.0

Extensions:
application-insights               1.2.1
k8s-extension                      1.6.1
ml                                2.26.0

Dependencies:
msal                              1.28.0
azure-mgmt-resource               23.1.1
PierceLovesee commented 4 months ago

+1 to @xec-abailey 's write up; I am experiencing a very similar issue.

kimzed commented 4 months ago

I am facing the same issue. I tried to create a managed online feature store from the SDK using the tutorial. I tried a lot of different things and checked the template and different deployments. The issue seems to have happened recently because a colleague could run the script without problem but now we do not manage to create it

PierceLovesee commented 4 months ago

This breaking change / bug was definitely introduced in the version release of the Azure ML Extension when upgraded from 2.25 to 2.26. Downgrading and locking to ml v2.25 resolved the undesirable behavior in our application.

janmolemans commented 4 months ago

We have the same issue. I noticed that the ARM template resulting from the az cli command has the following issue, namely that containerregistry variable is used for applicationinsights: 'applicationInsights': '[if(not(equals(parameters('applicationInsightsOption'), 'none')), variables('containerRegistry'), json('null'))]', 'containerRegistry': '[if(not(equals(parameters('containerRegistryOption'), 'none')), variables('containerRegistry'), json('null'))]',

achauhan-scc commented 4 months ago

Thanks for reporting the issue and providing the details. I am raising a PR to mitigate the issue.

mbizo1 commented 4 months ago

I a, experiencing the same issue as of today,

meet47 commented 4 months ago

Hoping to get it resolve soon.

ShawnLiu119 commented 4 months ago

facing the same iossue htere

endre-kosa commented 4 months ago

This breaking change / bug was definitely introduced in the version release of the Azure ML Extension when upgraded from 2.25 to 2.26. Downgrading and locking to ml v2.25 resolved the undesirable behavior in our application.

When I try this solution I get the following messages: 'Default enabled including preview versions for extension installation now. Disabled in future release. Use '--allow-preview true' to enable it specifically if needed. Use '--allow-preview false' to install stable version only.' or 'Extension 'ml' 2.26.0 is already installed.'

madiepev commented 3 months ago

Thank you for reporting this issue. It seems indeed related to the Cloud Shell and not to the .setup.sh. We're investigating the issue to explore any fixes/workaround and will update when we have something. For now, manual creation of resources seems the only fix as the ML extension can't be updated/downgraded in Cloud Shell.

achauhan-scc commented 3 months ago

ml extension 2.26.1 is released with the fix.

Underdoge commented 3 months ago

Hello @achauhan-scc,

I'm getting the following error when running "az extension update --name ml" in Cloud Shell:

Default enabled including preview versions for extension installation now. Disabled in future release. Use '--allow-preview true' to enable it specifically if needed. Use '--allow-preview false' to install stable version only. 
Cannot update system extension ml, please wait until Cloud Shell updates it in the next release.

And when trying "az extension update --name ml --allow-preview true" I get:

Cannot update system extension ml, please wait until Cloud Shell updates it in the next release.

I'm stuck on 2.26.0:

az --version
azure-cli                         2.61.0

core                              2.61.0
telemetry                          1.1.0

Extensions:
ai-examples                        0.2.5
ml                                2.26.0
ssh                                2.0.3
achauhan-scc commented 3 months ago

Can you please try below steps from https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-cli?view=azureml-api-2&tabs=public

Underdoge commented 3 months ago

Thanks @achauhan-scc, looks like the extension simply cannot be removed/updated in Cloud Shell, I guess it's just a matter of waiting until they update the Cloud Shell environment:

az extension remove -n ml
Cannot remove system extension ml in Cloud Shell.

az extension add -n ml

Default enabled including preview versions for extension installation now. Disabled in future release. Use '--allow-preview true' to enable it specifically if needed. Use '--allow-preview false' to install stable version only. 
Extension 'ml' 2.26.0 is already installed.
mwaqashraf commented 3 months ago

Awaiting the fix.

./setup.sh in DP-100 labs don't work, impacting all students preparing for certifications. az ml extension can't be updated/removed/upgraded/downgraded too.

Appreciate expedited support.

andreoniriccardo commented 3 months ago

I have the exact same issure as I am preparing for the DP-100 exam. Please provide prompt support, thank you

mwaqashraf commented 3 months ago

Thanks for the update.

On Wed, Jun 5, 2024 at 2:31 PM madiepev @.***> wrote:

UPDATE

The fix in the CLI ML extension seems to be fixed in the newest version. As you may have noticed however, you can't update the extension yourself in the Cloud Shell (you can when working locally). The extension will be updated in the next scheduled update of the Cloud Shell. We expect this fix to happen end of this week. Hopefully, next week the setup scripts should work again. Until then, the quickest workaround is to create the necessary resources manually through the portal instead of using the Cloud Shell.

— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-cli/issues/28980#issuecomment-2149328004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOHFYNZERKPYUMGJ3NDIADZF3LFZAVCNFSM6AAAAABHZ2B67SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBZGMZDQMBQGQ . You are receiving this because you commented.Message ID: @.***>

-- Regards, -Waqas

rag-lab commented 3 months ago

Awaiting the fix.

./setup.sh in DP-100 labs don't work, impacting all students preparing for certifications. az ml extension can't be updated/removed/upgraded/downgraded too.

Appreciate expedited support.

I belive it's valid to point that the link https://aka.ms/mslearn-dp100 is broken too. Reported multiple times in Coursera without any feedback.

christianstroh commented 3 months ago

Do the DP-100 exercises work again? Has anyone tried this?

madiepev commented 3 months ago

Works again for me!