Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

DataLakeServiceClient when used, with ClientCredentials, trough PrivateLink, getting self.continuation_token errors:" This request is not authorized to perform this operation" #37645

Open elisabetao opened 1 month ago

elisabetao commented 1 month ago

Describe the bug DataLakeServiceClient when used, with ClientCredentials, trough PrivateLink, getting self.continuation_token errors:" This request is not authorized to perform this operation", however when allowing the DataLakeServiceClient host IP storage account through firewall the authorization is successfull. The behaviour appears inconsistent given that using the same Service principal passed before as ClientCredentials, but through the Azure CLI: az login --allow-no-subscriptions --service-principal -u SP1 -p SP1Secret --tenant Teant1 and listing through the command is also successful az storage fs directory list --file-system --account-name Test--auth-mode login however using the below described script errors with: File "/root/test.py", line 19, in for container in containers: File "/usr/local/lib/python3.9/site-packages/azure/core/paging.py", line 123, in next return next(self._page_iterator) File "/usr/local/lib/python3.9/site-packages/azure/core/paging.py", line 75, in next self._response = self._get_next(self.continuation_token) File "/usr/local/lib/python3.9/site-packages/azure/storage/blob/_models.py", line 544, in _get_next_cb process_storage_error(error) File "/usr/local/lib/python3.9/site-packages/azure/storage/blob/_shared/response_handlers.py", line 186, in process_storage_error exec("raise error from None") # pylint: disable=exec-used # nosec File "", line 1, in azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation. RequestId:a6be6dd4-501e-0066-7c0f-11f005000000 Time:2024-09-27T19:00:06.9692023Z ErrorCode:AuthorizationFailure Content: <?xml version="1.0" encoding="utf-8"?>AuthorizationFailureThis request is not authorized to perform this operation. RequestId:a6be6dd4-501e-0066-7c0f-11f005000000 Time:2024-09-27T19:00:06.9692023Z

To Reproduce Steps to reproduce the behavior:

  1. Create ADLS Storage Account and create SP and secret and grant SP Azure Storage Bob Data Contributor.
  2. Configure network private links and run az cli connectivity validation test.
  3. Create test.py with the following code and run: from azure.identity import ClientSecretCredential from azure.storage.filedatalake import DataLakeServiceClient

Replace these values with your actual Azure AD credentials and Storage account details

tenant_id = 'YYYYYYYYYYYYYYYYYYYYYYYY' client_id = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX' client_secret = 'ZZZZZZZZZZZZZZZZZZZZZZZZ' account_url = 'https://MyStorageAccount.dfs.core.windows.net/'

Authenticate using service principal

credential = ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)

Create DataLake ServiceClient

service = DataLakeServiceClient(account_url=account_url, credential=credential)

List containers in the Blob service

file_systems= service.list_file_systems(logging_enable=True) for file_system in file_systems: print(file_system['name'])

  1. Instead of listing the ADLS filesystem, just like the az cli does, I get the error: Traceback (most recent call last): File "/root/test.py", line 19, in for container in containers: File "/usr/local/lib/python3.9/site-packages/azure/core/paging.py", line 123, in next return next(self._page_iterator) File "/usr/local/lib/python3.9/site-packages/azure/core/paging.py", line 75, in next self._response = self._get_next(self.continuation_token) File "/usr/local/lib/python3.9/site-packages/azure/storage/blob/_models.py", line 544, in _get_next_cb process_storage_error(error) File "/usr/local/lib/python3.9/site-packages/azure/storage/blob/_shared/response_handlers.py", line 186, in process_storage_error exec("raise error from None") # pylint: disable=exec-used # nosec File "", line 1, in azure.core.exceptions.HttpResponseError: This request is not authorized to perform this operation. RequestId:46ec260c-d01e-006b-0559-13dd82000000 Time:2024-09-30T16:55:41.0299784Z ErrorCode:AuthorizationFailure Content: <?xml version="1.0" encoding="utf-8"?>AuthorizationFailureThis request is not authorized to perform this operation. RequestId:46ec260c-d01e-006b-0559-13dd82000000 Time:2024-09-30T16:55:41.0299784Z

Expected behavior Expect that if I am able use a Service Principal login through and list its filessystems through Azure CLI successfully , to be able using DataLakeServiceClient with ClientCredentials to have the same behaviour of success from the python DataLakeServiceClient , as the documentation states that it support and TokenCredentials (ClientCredentials included) :https://learn.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python#parameters, no matter if the access is done publicly through Firewall or Privately through private endpoints. I have tested also Azure key vault ( with a similar Private link setup) and the unlike DataLakeServiceClient the AKV ServiceClient

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jalauzon-msft @vincenttran-msft.

jalauzon-msft commented 1 month ago

Hi @elisabetao, do you have the correct RBAC roles setup on your service principle to access Storage? Your service principle will need to have one of the built-in Blob Storage RBAC roles (or a custom role with the right permissions) to access Storage using a TokenCredential with the SDK.

Azure CLI can talk to Storage without these specific roles, with the Contributor role for example, as they use that management plane to fetch the account key so they can cache and use that for Storage requests.

elisabetao commented 1 month ago

Hello @jalauzon-msft , as mentioned above the storage account already has Azure Storage Bob Data Contributor. Do you see specifically a role that could create issue. Didi you get the chance to test and try to replicate on your side?

Thanks a lot