Azure / azure-cli

Azure Command-Line Interface
MIT License
3.91k stars 2.88k forks source link

az ml batch-endpoint invoke wants an AccountKey or SasToken for registered Datastores for output #25846

Open neilmca-inc opened 1 year ago

neilmca-inc commented 1 year ago

From here https://learn.microsoft.com/en-us/cli/azure/ml/batch-endpoint?view=azure-cli-latest#az-ml-batch-endpoint-invoke

This command works as the output is to the registered internal Azure Machine Learning Default Datastore...

az ml batch-endpoint invoke --name mybatchendpoint --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample --input-type uri_folder --output-path azureml://datastores/workspaceblobstore/paths/mybatchendpoint --set output_file_name=predictions.csv --query name -o tsv --resource-group myrg --workspace-name myamlworkspace

When I want my output to go to another Storage Account location - I have pre-registered it in AML Studio as follows...(under Data > Datastores)

Datastore name: data_mystorageaccount Datastore type: Azure Blob Storage Subscription ID: {redactedmyAzureSub} Storage account: mystorageaccount Blob container: output Save credentials with the datastore for data access - enabled Authentication type: Account key Account key: {the account key from the storage account mystorageaccount} Clicked Create

From looking in the Studio it can browse to that output location - see image - it can see a file already called ServiceTags_Public_20230306.json in the right hand pane...

invoke

This proves connection to the Storage Account via the AccountKey as successful.

When I run the following command...

az ml batch-endpoint invoke --name mybatchendpoint --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample --input-type uri_folder --output-path azureml://datastores/data_mystorageaccount/paths/mybatchendpoint --set output_file_name=predictions.csv --query name -o tsv --resource-group myrg --workspace-name myamlworkspace

...it fails with the following output error Missing AccountKey or SasToken

ERROR: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Missing AccountKey or SasToken",
    "messageFormat": null,
    "messageParameters": null,
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": null,
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "{redacted}",
    "request": "{redacted}"
  },
  "environment": "{redacted}",
  "location": "{redacted}",
  "time": "2023-03-16T13:41:12.1918203+00:00",
  "componentName": "managementfrontend"
}

It states on the Datastores screen in AML Studio the following...

Datastores securely connect to a storage service on Azure by storing connection information. With datastores, you no longer need to provide credential information in your scripts to access your data

...so I'm not sure why this invoke command fails? Why does it need a AccountKey or SasToken passed to it as part of the az ml batch-endpoint invoke command?

Even if I did need to pass a AccountKey or SasToken as part of this command, is it supported here? There are no examples listed in the documentation for output types


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

yonzhan commented 1 year ago

route to CXP team

kartikeyavs commented 8 months ago

Are there any findings on this particular issue? I am seeing this issue in our Azure tenent.

santiagxf commented 8 months ago

Thanks for reporting this issue to us. We are happy to help. Can you please run the same command but with the flag --debug? That would be:

az ml batch-endpoint invoke --name mybatchendpoint \
                            --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample \
                            --input-type uri_folder \
                            --output-path azureml://datastores/workspaceblobstore/paths/mybatchendpoint \
                            --set output_file_name=predictions.csv \
                            --resource-group myrg \
                            --workspace-name myamlworkspace  \
                            --debug

Please remove any PII information before sharing it. Also, it will help to know if thi workspace is under a private VNET?

jsfsn commented 8 months ago

We are having the same issue.

borgewi commented 8 months ago

We have the same issue @santiagxf. It stopped working around October 5th and seems to be an issue with Azure CLI.

yacuzo commented 8 months ago

Same issue. Both on an API calling AML endpoint using REST and in Azure CLI. Seems like the issue is on AML itself. Ouput (and input) storage are in another VNET with service-endpoint.

Scrubbed Log: aml output auth error.txt

santiagxf commented 8 months ago

We are investigating this issue. I will provide an update within the day.

borgewi commented 8 months ago

@santiagxf Any update here?

santiagxf commented 8 months ago

This issue is under investigation and we are trying to identify the root of the issue with the logs provided. I will update the thread as soon as we have an update. As a workaround, can you try out on a new created endpoint? I can't reproduce the issue from my side so it may be related with existing ones.

yacuzo commented 8 months ago

Issue seems to be solved now. We didn't make any changes, so someone on your end has solved it. Both old and new endpoints work.

neilmca-inc commented 8 months ago

@yacuzo To be certain - do you mean that going to the default registered AML Studio's "datastore" is working, or that in-fact any other imported "datastore" storage account is working? My original issue was around the latter here so I'm interested to know if this is fixed before anyone starts to close this off as resolved - as I never experienced issues to the default "datastore" - Thanks

borgewi commented 8 months ago

Would like to know the cause of this in case it happens again. This is crucial to our solution and causes big problems if it fails.

yacuzo commented 8 months ago

@neilmca-inc We have a separate linked datastore, in a peered VNET, with service-endpoint and not available publicly. We also have a support ticket on this, and MS has responded that a hotfix has been deployed. I'd consider this closed.

neilmca-inc commented 8 months ago

I'll do some testing on this - probably by the end of the week - to see how I get on

santiagxf commented 8 months ago

We have identified an issue with the service and a hotfix was applied. The rollout is going on as we speak. WestEurope, NorthEurope, and SouthCentralUS regions are already patched and we continue rolling on on more regions as we speak. We apologies for the inconvenient. I will keep this issue opened until completely roll out.

ms-kashyap commented 6 months ago

I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2?

The full error message:

Content: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Missing AccountKey or SasToken",
    "messageFormat": null,
    "messageParameters": null,
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": null,
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "76ef1f8efce8236907289f004d0dcc16",
    "request": "1d98980b3884c1ae"
  },
  "environment": "westus2",
  "location": "westus2",
  "time": "2023-12-12T12:20:31.7688103+00:00",
  "componentName": "managementfrontend",
  "statusCode": 400
}
santiagxf commented 6 months ago

Thank you @ms-kashyap for reporting the issue. I'm checking this with our engineering team and will provide a resolution.

nagendratekuri commented 6 months ago

I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2?

The full error message:

Content: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Missing AccountKey or SasToken",
    "messageFormat": null,
    "messageParameters": null,
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": null,
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "76ef1f8efce8236907289f004d0dcc16",
    "request": "1d98980b3884c1ae"
  },
  "environment": "westus2",
  "location": "westus2",
  "time": "2023-12-12T12:20:31.7688103+00:00",
  "componentName": "managementfrontend",
  "statusCode": 400
}

We were also getting the same issue. The problem is with the permissions on your custom data store. Our batch end point doesn't have permission to write into our custom data store (this has been working until last week). However, it is able to write into default data store workspaceblobstore. As a temporary fix, we changed our output path to default data store workspaceblobstore and it worked.

I hope this helps in finding the permanent fix.

ms-kashyap commented 6 months ago

I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2? The full error message:

Content: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Missing AccountKey or SasToken",
    "messageFormat": null,
    "messageParameters": null,
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": null,
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "76ef1f8efce8236907289f004d0dcc16",
    "request": "1d98980b3884c1ae"
  },
  "environment": "westus2",
  "location": "westus2",
  "time": "2023-12-12T12:20:31.7688103+00:00",
  "componentName": "managementfrontend",
  "statusCode": 400
}

We were also getting the same issue. The problem is with the permissions on your custom data store. Our batch end point doesn't have permission to write into our custom data store (this has been working until last week). However, it is able to write into default data store workspaceblobstore. As a temporary fix, we changed our output path to default data store workspaceblobstore and it worked.

I hope this helps in finding the permanent fix.

That fix totally worked! Thank you so much for the pointer.

I am wondering if the team can reply on why the issue re-occurred randomly, as it resulted in me spending a few days trying out a host of things as I thought it was user error! 😅

I'm seeing in the docs that any registered datastore should work: image

asgardian1196 commented 6 months ago

We have the same issue @santiagxf. We are getting the below response for uksouth region since 12-14-2023 -

error: Missing AccountKey or SasToken

Content: {
  "error": {
    "code": "UserError",
    "severity": null,
    "message": "Missing AccountKey or SasToken",
    "messageFormat": null,
    "messageParameters": null,
    "referenceCode": null,
    "detailsUri": null,
    "target": null,
    "details": [],
    "innerError": null,
    "debugInfo": null,
    "additionalInfo": null
  },
  "correlation": {
    "operation": "bc7b28ff667a26cd6b236e2d58239c54",
    "request": "68cea493082c06e6"
  },
  "environment": "uksouth",
  "location": "uksouth",
  "time": "2023-12-15T09:43:59.7361588+00:00",
  "componentName": "managementfrontend",
  "statusCode": 400
}
santiagxf commented 6 months ago

Update: A permanent fix is being deployed in the service to resolve the issue. We are performing a progressive rollout across the many regions. I will provide an update once completely rollout.

santiagxf commented 6 months ago

@ms-kashyap we have rolled out the fix on westus2. Please let us know if this solves the issue. We continue rollout in other regions too.

jameseedi commented 5 months ago

Hi, we have the same issue in uksouth. Was wondering when the patch would be rolled out there?

santiagxf commented 5 months ago

Hi @jameseedi! We have rolled out the patch to all the regions. Can you please try again with a new deployment and see if the error persist?