azure.identity.ManagedIdentityCredential does not work in Synapse Analytics notebook

MrNickArcher commented 2 years ago

Package Name: azure.identity
Package Version: 1.5.0
Operating System: Synapse Analytics (Linux vm-d0268763 4.15.0-1151-azure #166-Ubuntu SMP Tue Sep 6 17:42:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux)
Python Version: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) \n[GCC 9.3.0]

Describe the bug

Attempt to use ManagedIdentityCredential in Azure Synapse will produce an error even though I feel it should definitely work. (I even have a hack that forces it to work, see below)

To Reproduce Steps to reproduce the behaviour:

Create a queue in an azure storage account, add some test messages
Open an azure synapse workspace and create spark pool,
Create a new python notebook
Ensure notebook is running with "Use Managed Identity"
Ensure synapse workspace managed identity is added as Role Storage Queue Data Contributor on the storage account

from azure.storage.queue import QueueServiceClient
from azure.identity import ManagedIdentityCredential
queue_service_client = QueueServiceClient(
    account_url = "https://<STOARAGE_ACCOUNT_NAME>.queue.core.windows.net/",
    credential  = ManagedIdentityCredential()
)
queue_client = queue_service_client.get_queue_client("<QUEUE NAME>")
[item for item in queue_client.peek_messages(5)]

ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
---------------------------------------------------------------------------
CredentialUnavailableError                Traceback (most recent call last)
<ipython-input-64-54ee11f0> in <module>
----> 1 [item for item in queue_client.peek_messages(20)]

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
     76             span_impl_type = settings.tracing_implementation()
     77             if span_impl_type is None:
---> 78                 return func(*args, **kwargs)
     79 
     80             # Merge span is parameter is set, but only if no explicit parent are passed

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_queue_client.py in peek_messages(self, max_messages, **kwargs)
    828             return wrapped_messages
    829         except HttpResponseError as error:
--> 830             process_storage_error(error)
    831 
    832     @distributed_trace

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_shared/response_handlers.py in process_storage_error(storage_error)
     88     serialized = False
     89     if not storage_error.response or storage_error.response.status_code in [200, 204]:
---> 90         raise storage_error
     91     # If it is one of those three then it has been serialized prior by the generated layer.
     92     if isinstance(storage_error, (PartialBatchErrorException,

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_queue_client.py in peek_messages(self, max_messages, **kwargs)
    818             resolver=self.key_resolver_function)
    819         try:
--> 820             messages = self._client.messages.peek(
    821                 number_of_messages=max_messages,
    822                 timeout=timeout,

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
     76             span_impl_type = settings.tracing_implementation()
     77             if span_impl_type is None:
---> 78                 return func(*args, **kwargs)
     79 
     80             # Merge span is parameter is set, but only if no explicit parent are passed

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_generated/operations/_messages_operations.py in peek(self, number_of_messages, timeout, request_id_parameter, **kwargs)
    502         request.url = self._client.format_url(request.url)  # type: ignore
    503 
--> 504         pipeline_response = self._client._pipeline.run(  # type: ignore # pylint: disable=protected-access
    505             request, stream=False, **kwargs
    506         )

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in run(self, request, **kwargs)
    209             else _TransportRunner(self._transport)
    210         )
--> 211         return first_node.send(pipeline_request)  # type: ignore

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/policies/_redirect.py in send(self, request)
    156         redirect_settings = self.configure_redirects(request.context.options)
    157         while retryable:
--> 158             response = self.next.send(request)
    159             redirect_location = self.get_redirect_location(response)
    160             if redirect_location and redirect_settings['allow']:

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_shared/policies.py in send(self, request)
    536                     self.sleep(retry_settings, request.context.transport)
    537                     continue
--> 538                 raise err
    539         if retry_settings['history']:
    540             response.context['history'] = retry_settings['history']

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/storage/queue/_shared/policies.py in send(self, request)
    510         while retries_remaining:
    511             try:
--> 512                 response = self.next.send(request)
    513                 if is_retry(response, retry_settings['mode']):
    514                     retries_remaining = self.increment(

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/_base.py in send(self, request)
     69         _await_result(self._policy.on_request, request)
     70         try:
---> 71             response = self.next.send(request)
     72         except Exception:  # pylint: disable=broad-except
     73             _await_result(self._policy.on_exception, request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication.py in send(self, request)
    114         :type request: ~azure.core.pipeline.PipelineRequest
    115         """
--> 116         self.on_request(request)
    117         try:
    118             response = self.next.send(request)

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/core/pipeline/policies/_authentication.py in on_request(self, request)
     91 
     92         if self._token is None or self._need_new_token:
---> 93             self._token = self._credential.get_token(*self._scopes)
     94         self._update_headers(request.http_request.headers, self._token.token)
     95 

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/identity/_internal/decorators.py in wrapper(*args, **kwargs)
     25         def wrapper(*args, **kwargs):
     26             try:
---> 27                 token = fn(*args, **kwargs)
     28                 _LOGGER.info("%s succeeded", qualified_name)
     29                 return token

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/identity/_credentials/managed_identity.py in get_token(self, *scopes, **kwargs)
     91         if not self._credential:
     92             raise CredentialUnavailableError(message="No managed identity endpoint found.")
---> 93         return self._credential.get_token(*scopes, **kwargs)
     94 
     95 

~/cluster-env/clonedenv/lib/python3.8/site-packages/azure/identity/_credentials/managed_identity.py in get_token(self, *scopes, **kwargs)
    188         if not self._endpoint_available:
    189             message = "ManagedIdentityCredential authentication unavailable, no managed identity endpoint found."
--> 190             raise CredentialUnavailableError(message=message)
    191 
    192         if len(scopes) != 1:

CredentialUnavailableError: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.

Expected behavior

ManagedIdentityCredential will work because the notebook session is running with "Run as managed identity" Enabled.

Screenshots If applicable, add screenshots to help explain your problem. N/A

Additional context

I have discovered a work-around; the following script works as expected:

from azure.storage.queue import QueueServiceClient
from azure.identity import ManagedIdentityCredential
from azure.core.credentials import AccessToken

class spoof_token:
    def get_token(*args, **kwargs):
        return AccessToken(
            token=mssparkutils.credentials.getToken(audience="storage"),
            expires_on=int(time.time())+60*10 # some random time in future... synapse doesn't document how to get the actual time
        )

credential = ManagedIdentityCredential()
credential._credential = spoof_token() # monkey-patch the contents of the private `_credential`

queue_service_client = QueueServiceClient(
    account_url = "https://<STOARAGE_ACCOUNT_NAME>.queue.core.windows.net/",
    credential  = ManagedIdentityCredential()
)
queue_client = queue_service_client.get_queue_client("<QUEUE NAME>")
print([item for item in queue_client.peek_messages(5)])

I am sure there should be a more straightforward way to do this? I have sunk hours into this problem to finally find this hacky work-around. I realise this issue might be better raised as a Synapse Support ticket, but I don't have permissions to do that, and I am still not sure if there is some other obvious method I have missed? The only other azure.identity credential that works is the DeviceCodeCredential; But that cant be automated, and it uses my own credentials instead of the synapse managed identity.

Many thanks

azure-sdk commented 2 years ago

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Azure.Identity:0.39929336,Storage:0.39017525,Event Hubs:0.016472006'

swathipil commented 2 years ago

Hi @MrNickArcher - Thanks for the detailed instructions on reproducing the behavior! We'll take a look as soon as possible!

xiangyan99 commented 2 years ago

Thanks for reaching out.

Unfortunately, azure-identity does not work for the jobs that run in synapse workspace.

For more information, you can check https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-service-identity?context=%2Fazure%2Fsynapse-analytics%2Fcontext%2Fcontext.

ghost commented 2 years ago

Hi @MrNickArcher. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

MrNickArcher commented 2 years ago

As I demonstrated, azure-identity does work in synapse if Microsoft wants it to. It is needed for some use-cases such as accessing storage queue from a synapse python notebook. (Synapse only really has convenient mechanisms to access blob storage and basically nothing else) The alternative is to use keyvault + storage account access key to access the queue. I don't understand why Microsoft prefers to leave synapse users with that second-rate less secure option.

xiangyan99 commented 2 years ago

Thanks for the feedback.

To make sure we are on same page, synapse does support managed identity.

It has its own implementation hence azure.identity.managedidentity does not work in such environments.

Agreed that it will have better experience if we can integrate them into one package and don't ask customers to be aware of the difference.

MrNickArcher commented 2 years ago

Aware of the difference? Synapse has a managed Identity, mssparkutil.credentials lets you obtain a valid token using that identity, but there is no documentation for how to use that token to construct QueueServiceClient(account_url = "...", credential = ????) in a python notebook environment.

xiangyan99 commented 2 years ago

Yes. QueueServiceClient(account_url = "...", credential = ????) uses azure.identity.credentials while in Synapse environment, azure.identity.managedidentity cannot successfully get the token.

In other words, you need to compose your own requests and add the token into headers if you want to use QueueServiceClient in Synapse.

And that's the difference.

MrNickArcher commented 2 years ago

Ok, can I construct some object, I dunno, maybe something in azure.core.credentials that would be accepted by the QueueServiceClient constructor credential parameter? I don't think manually constructing headers is a good way to go. I really appreciate the work you have done to create these python libraries so that I don't need to do that, except for in this case apparently.

xiangyan99 commented 2 years ago

QueueServiceClient looks for credential.get_token(scopes).

Maybe you can make your own credential class and wrap its get_token method to call into mssparkutil.credentials

MrNickArcher commented 1 year ago

How was this addressed?

xiangyan99 commented 1 year ago

Unfortunately, as I said, it is by design that azure-identity library does not work in Synapse Analytics notebook. There is no easy way to use QueueServiceClient with mssparkutil.credentials.

One option is you can call mssparkutil.credentials to get the token and add it into the header in your code.

Here is a sample: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/samples/test_example_policies.py#L27

joachimaae commented 1 year ago

This also makes using Synapse together with Azure Machine Learning more difficult and very unclear.

The documentation suggests using managed identity to trigger an endpoint: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-authenticate-batch-endpoint?tabs=sdk#running-jobs-using-a-managed-identity

Had managed identity worked in this scenario, it would have been a really simple and elegant way to trigger a batch endpoint after a synapse pipeline is ran through a notebook.

As this does not work, you have to do a workaround and use the REST API directly, through a web activity: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-azure-data-factory?tabs=mi

The built-in Synapse linked service for Azure Machine Learning does not support triggering endpoints directly. So if you want to trigger endpoints you have to decide between having a simple authentication but more complex triggering logic (web activity), or do as the documentation suggests and use the Python SDK to trigger the endpoint with a single line, but with the added complexity of using service principals and doing OAuth to authenticate (and spending time figuring out that the documented solution does not work, by googling and finding this page as the only source of this information).

Azure / azure-sdk-for-python

azure.identity.ManagedIdentityCredential does not work in Synapse Analytics notebook #26997