Open neilmca-inc opened 1 year ago
route to CXP team
Are there any findings on this particular issue? I am seeing this issue in our Azure tenent.
Thanks for reporting this issue to us. We are happy to help. Can you please run the same command but with the flag --debug
? That would be:
az ml batch-endpoint invoke --name mybatchendpoint \
--input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample \
--input-type uri_folder \
--output-path azureml://datastores/workspaceblobstore/paths/mybatchendpoint \
--set output_file_name=predictions.csv \
--resource-group myrg \
--workspace-name myamlworkspace \
--debug
Please remove any PII information before sharing it. Also, it will help to know if thi workspace is under a private VNET?
We are having the same issue.
We have the same issue @santiagxf. It stopped working around October 5th and seems to be an issue with Azure CLI.
Same issue. Both on an API calling AML endpoint using REST and in Azure CLI. Seems like the issue is on AML itself. Ouput (and input) storage are in another VNET with service-endpoint.
Scrubbed Log: aml output auth error.txt
We are investigating this issue. I will provide an update within the day.
@santiagxf Any update here?
This issue is under investigation and we are trying to identify the root of the issue with the logs provided. I will update the thread as soon as we have an update. As a workaround, can you try out on a new created endpoint? I can't reproduce the issue from my side so it may be related with existing ones.
Issue seems to be solved now. We didn't make any changes, so someone on your end has solved it. Both old and new endpoints work.
@yacuzo To be certain - do you mean that going to the default registered AML Studio's "datastore" is working, or that in-fact any other imported "datastore" storage account is working? My original issue was around the latter here so I'm interested to know if this is fixed before anyone starts to close this off as resolved - as I never experienced issues to the default "datastore" - Thanks
Would like to know the cause of this in case it happens again. This is crucial to our solution and causes big problems if it fails.
@neilmca-inc We have a separate linked datastore, in a peered VNET, with service-endpoint and not available publicly. We also have a support ticket on this, and MS has responded that a hotfix has been deployed. I'd consider this closed.
I'll do some testing on this - probably by the end of the week - to see how I get on
We have identified an issue with the service and a hotfix was applied. The rollout is going on as we speak. WestEurope, NorthEurope, and SouthCentralUS regions are already patched and we continue rolling on on more regions as we speak. We apologies for the inconvenient. I will keep this issue opened until completely roll out.
I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2?
The full error message:
Content: {
"error": {
"code": "UserError",
"severity": null,
"message": "Missing AccountKey or SasToken",
"messageFormat": null,
"messageParameters": null,
"referenceCode": null,
"detailsUri": null,
"target": null,
"details": [],
"innerError": null,
"debugInfo": null,
"additionalInfo": null
},
"correlation": {
"operation": "76ef1f8efce8236907289f004d0dcc16",
"request": "1d98980b3884c1ae"
},
"environment": "westus2",
"location": "westus2",
"time": "2023-12-12T12:20:31.7688103+00:00",
"componentName": "managementfrontend",
"statusCode": 400
}
Thank you @ms-kashyap for reporting the issue. I'm checking this with our engineering team and will provide a resolution.
I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2?
The full error message:
Content: { "error": { "code": "UserError", "severity": null, "message": "Missing AccountKey or SasToken", "messageFormat": null, "messageParameters": null, "referenceCode": null, "detailsUri": null, "target": null, "details": [], "innerError": null, "debugInfo": null, "additionalInfo": null }, "correlation": { "operation": "76ef1f8efce8236907289f004d0dcc16", "request": "1d98980b3884c1ae" }, "environment": "westus2", "location": "westus2", "time": "2023-12-12T12:20:31.7688103+00:00", "componentName": "managementfrontend", "statusCode": 400 }
We were also getting the same issue. The problem is with the permissions on your custom data store. Our batch end point doesn't have permission to write into our custom data store (this has been working until last week). However, it is able to write into default data store workspaceblobstore. As a temporary fix, we changed our output path to default data store workspaceblobstore and it worked.
I hope this helps in finding the permanent fix.
I found this issue back in October, when we started facing this issue. From 10/26 to 12/1, we were able to successfully invoke our batch endpoint (without making any changes on our side). However, we started getting this issue again. Is there any chance the outage is still occurring in westus2? The full error message:
Content: { "error": { "code": "UserError", "severity": null, "message": "Missing AccountKey or SasToken", "messageFormat": null, "messageParameters": null, "referenceCode": null, "detailsUri": null, "target": null, "details": [], "innerError": null, "debugInfo": null, "additionalInfo": null }, "correlation": { "operation": "76ef1f8efce8236907289f004d0dcc16", "request": "1d98980b3884c1ae" }, "environment": "westus2", "location": "westus2", "time": "2023-12-12T12:20:31.7688103+00:00", "componentName": "managementfrontend", "statusCode": 400 }
We were also getting the same issue. The problem is with the permissions on your custom data store. Our batch end point doesn't have permission to write into our custom data store (this has been working until last week). However, it is able to write into default data store workspaceblobstore. As a temporary fix, we changed our output path to default data store workspaceblobstore and it worked.
I hope this helps in finding the permanent fix.
That fix totally worked! Thank you so much for the pointer.
I am wondering if the team can reply on why the issue re-occurred randomly, as it resulted in me spending a few days trying out a host of things as I thought it was user error! 😅
I'm seeing in the docs that any registered datastore should work:
We have the same issue @santiagxf. We are getting the below response for uksouth region since 12-14-2023 -
error: Missing AccountKey or SasToken
Content: {
"error": {
"code": "UserError",
"severity": null,
"message": "Missing AccountKey or SasToken",
"messageFormat": null,
"messageParameters": null,
"referenceCode": null,
"detailsUri": null,
"target": null,
"details": [],
"innerError": null,
"debugInfo": null,
"additionalInfo": null
},
"correlation": {
"operation": "bc7b28ff667a26cd6b236e2d58239c54",
"request": "68cea493082c06e6"
},
"environment": "uksouth",
"location": "uksouth",
"time": "2023-12-15T09:43:59.7361588+00:00",
"componentName": "managementfrontend",
"statusCode": 400
}
Update: A permanent fix is being deployed in the service to resolve the issue. We are performing a progressive rollout across the many regions. I will provide an update once completely rollout.
@ms-kashyap we have rolled out the fix on westus2. Please let us know if this solves the issue. We continue rollout in other regions too.
Hi, we have the same issue in uksouth. Was wondering when the patch would be rolled out there?
Hi @jameseedi! We have rolled out the patch to all the regions. Can you please try again with a new deployment and see if the error persist?
From here https://learn.microsoft.com/en-us/cli/azure/ml/batch-endpoint?view=azure-cli-latest#az-ml-batch-endpoint-invoke
This command works as the output is to the registered internal Azure Machine Learning Default Datastore...
az ml batch-endpoint invoke --name mybatchendpoint --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample --input-type uri_folder --output-path azureml://datastores/workspaceblobstore/paths/mybatchendpoint --set output_file_name=predictions.csv --query name -o tsv --resource-group myrg --workspace-name myamlworkspace
When I want my output to go to another Storage Account location - I have pre-registered it in AML Studio as follows...(under Data > Datastores)
Datastore name:
data_mystorageaccount
Datastore type: Azure Blob Storage Subscription ID: {redactedmyAzureSub} Storage account:mystorageaccount
Blob container:output
Save credentials with the datastore for data access -enabled
Authentication type:Account key
Account key:{the account key from the storage account mystorageaccount}
Clicked CreateFrom looking in the Studio it can browse to that output location - see image - it can see a file already called
ServiceTags_Public_20230306.json
in the right hand pane...This proves connection to the Storage Account via the AccountKey as successful.
When I run the following command...
az ml batch-endpoint invoke --name mybatchendpoint --input https://azuremlexampledata.blob.core.windows.net/data/mnist/sample --input-type uri_folder --output-path azureml://datastores/data_mystorageaccount/paths/mybatchendpoint --set output_file_name=predictions.csv --query name -o tsv --resource-group myrg --workspace-name myamlworkspace
...it fails with the following output error
Missing AccountKey or SasToken
It states on the
Datastores
screen in AML Studio the following...Datastores securely connect to a storage service on Azure by storing connection information. With datastores, you no longer need to provide credential information in your scripts to access your data
...so I'm not sure why this invoke command fails? Why does it need a AccountKey or SasToken passed to it as part of the az ml batch-endpoint invoke command?
Even if I did need to pass a AccountKey or SasToken as part of this command, is it supported here? There are no examples listed in the documentation for output types
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.