Closed MrNickArcher closed 1 year ago
Label prediction was below confidence level 0.6
for Model:ServiceLabels
: 'Azure.Identity:0.39929336,Storage:0.39017525,Event Hubs:0.016472006'
Hi @MrNickArcher - Thanks for the detailed instructions on reproducing the behavior! We'll take a look as soon as possible!
Thanks for reaching out.
Unfortunately, azure-identity does not work for the jobs that run in synapse workspace.
For more information, you can check https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-service-identity?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext.
Hi @MrNickArcher. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve
” to remove the “issue-addressed” label and continue the conversation.
As I demonstrated, azure-identity does work in synapse if Microsoft wants it to. It is needed for some use-cases such as accessing storage queue from a synapse python notebook. (Synapse only really has convenient mechanisms to access blob storage and basically nothing else) The alternative is to use keyvault + storage account access key to access the queue. I don't understand why Microsoft prefers to leave synapse users with that second-rate less secure option.
Thanks for the feedback.
To make sure we are on same page, synapse does support managed identity.
It has its own implementation hence azure.identity.managedidentity does not work in such environments.
Agreed that it will have better experience if we can integrate them into one package and don't ask customers to be aware of the difference.
Aware of the difference? Synapse has a managed Identity, mssparkutil.credentials
lets you obtain a valid token using that identity, but there is no documentation for how to use that token to construct
QueueServiceClient(account_url = "...", credential = ????)
in a python notebook environment.
Yes. QueueServiceClient(account_url = "...", credential = ????) uses azure.identity.credentials while in Synapse environment, azure.identity.managedidentity cannot successfully get the token.
In other words, you need to compose your own requests and add the token into headers if you want to use QueueServiceClient in Synapse.
And that's the difference.
Ok, can I construct some object, I dunno, maybe something in azure.core.credentials
that would be accepted by the QueueServiceClient
constructor credential parameter? I don't think manually constructing headers is a good way to go. I really appreciate the work you have done to create these python libraries so that I don't need to do that, except for in this case apparently.
QueueServiceClient looks for credential.get_token(scopes).
Maybe you can make your own credential class and wrap its get_token method to call into mssparkutil.credentials
How was this addressed?
Unfortunately, as I said, it is by design that azure-identity library does not work in Synapse Analytics notebook. There is no easy way to use QueueServiceClient with mssparkutil.credentials.
One option is you can call mssparkutil.credentials to get the token and add it into the header in your code.
Here is a sample: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/samples/test_example_policies.py#L27
This also makes using Synapse together with Azure Machine Learning more difficult and very unclear.
The documentation suggests using managed identity to trigger an endpoint: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-authenticate-batch-endpoint?tabs=sdk#running-jobs-using-a-managed-identity
Had managed identity worked in this scenario, it would have been a really simple and elegant way to trigger a batch endpoint after a synapse pipeline is ran through a notebook.
As this does not work, you have to do a workaround and use the REST API directly, through a web activity: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-azure-data-factory?tabs=mi
The built-in Synapse linked service for Azure Machine Learning does not support triggering endpoints directly. So if you want to trigger endpoints you have to decide between having a simple authentication but more complex triggering logic (web activity), or do as the documentation suggests and use the Python SDK to trigger the endpoint with a single line, but with the added complexity of using service principals and doing OAuth to authenticate (and spending time figuring out that the documented solution does not work, by googling and finding this page as the only source of this information).
Describe the bug
Attempt to use
ManagedIdentityCredential
in Azure Synapse will produce an error even though I feel it should definitely work. (I even have a hack that forces it to work, see below)To Reproduce Steps to reproduce the behaviour:
Storage Queue Data Contributor
on the storage accountExpected behavior
ManagedIdentityCredential will work because the notebook session is running with "Run as managed identity" Enabled.
Screenshots If applicable, add screenshots to help explain your problem. N/A
Additional context
I have discovered a work-around; the following script works as expected:
I am sure there should be a more straightforward way to do this? I have sunk hours into this problem to finally find this hacky work-around. I realise this issue might be better raised as a Synapse Support ticket, but I don't have permissions to do that, and I am still not sure if there is some other obvious method I have missed? The only other
azure.identity
credential that works is theDeviceCodeCredential
; But that cant be automated, and it uses my own credentials instead of the synapse managed identity.Many thanks