Open Seb-Unit8 opened 1 year ago
Here is a workaround that works for me:
import adlfs
import azure.identity.aio
abfs = adlfs.AzureBlobFileStorage(account_name=account_name, credential=azure.identity.aio.DefaultAzureCredential())
abfs.ls(container_name + "/" + subpath)
I encountered the same problem when running code that uses adlfs
on ComputeInstance (CI) in AzureML with User-managed identity.
The identity has correct permission, which I can confirm running:
az login --identity --username xxx
az storage blob list --account-name SANAME --container-name MYCONTAINER --output table
However, it seems that automatic credentials resolution takes SystemAssigned Identity instead of User-manged identity assigned to the CI. Looking into DefaultCredentials Resolution Order Managed-identity should be correctly resolved, but it is not.
It seems like CI always have SystemAssigned Identity (?) and it may take precedence over User-managed identity. Digging into Azure identity python SDK it seems like setting a single environment variable should work and it indeed does:
import os
os.environ['AZURE_CLIENT_ID'] = 'xxx'
storage_options = {'account_name': SANAME, 'anon': False}
ddf = dd.read_parquet('az://MYCONTAINER/*.csv', storage_options=storage_options)
What would be nice for adlfs
is an option to provide two arguments to storage_options
, namely: storage_options = {'account_name': SANAME, 'client_id': 'xxx'}
and as a result passed client_id
should be used to fetch credentials. Currently such combination results in error: ValueError: secret should be an Azure Active Directory application's client secret
Versions
adlfs: 2023.1.0
fsspec: 2023.4.0
Summary:
Hello,
I am not experiencing the expected behaviour introduced in #262 and documented in the project's
README > Details > Setting credentials > 2
: "2. Auto credential solving using Azure's DefaultAzureCredential() library:storage_options={'account_name': ACCOUNT_NAME, 'anon': False}
will useDefaultAzureCredential
to get valid credentials to the containerACCOUNT_NAME
.DefaultAzureCredential
attempts to authenticate via the mechanisms and order visualized here."The following code snippet outputs the expected return of the containers list:
verifying that the managed identity for this VM has the right permissions (Storage Blob Data Contributor).
However, the following code
run in the same environment throws the error:
ValueError: unable to connect to account for Must provide either a connection_string or account_name with credentials!!
Is anyone able to identify why the DefaultAzureCredential fallback is not being triggered even though I have specified the
anon=False
keyword?Thanks for any help.