Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.44k stars 2.75k forks source link

Unable to retrieve AzureDataLakeGen2Datastore datastore - SDKv2 #35114

Closed obiii closed 4 months ago

obiii commented 4 months ago

The bug

  1. Unable to get datastore (created by SDKv1) using SDKv2 using managed identity authentication. image

  2. Unable to get datastore (created by SDKv2) using SDKv2 using managed identity authentication. The datastore is created but the **Subscription ID** and **Resource Group Name** are empty.

image

To Reproduce

from azure.ai.ml import MLClient
from azure.identity import ManagedIdentityCredential
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core.exceptions import ResourceExistsError

def managed_identity_auth(func):
    def wrapper(*args, **kwargs):
        credentials = ManagedIdentityCredential(client_id=<my_client_id>)
        return func(credentials, *args, **kwargs)
    return wrapper

@managed_identity_auth
def create_datastore(credentials):
    # Authenticate with Azure using the provided credentials
    ml_client = MLClient(credential=credentials, subscription_id=<subscripton_id>, resource_group_name=<rg_name>, workspace_name=<workspace_name>)

    # Create a new file system (container) in Azure Data Lake Storage Gen2
    storage_account_name = <storage_account_name>
    file_system_name = "sdkv2-test2"

    service_client = DataLakeServiceClient(account_url=f"https://{storage_account_name}.dfs.core.windows.net", credential=credentials)
    file_system_client = service_client.get_file_system_client(file_system=file_system_name)
    try:
        file_system_client.create_file_system()
        print(f"Created file system: {file_system_name}")
    except ResourceExistsError:
        print(f"File system {file_system_name} already exists.")

    # Create the Azure Data Lake Storage Gen2 Datastore using the managed identity
    adls_gen2_datastore = AzureDataLakeGen2Datastore(
        name="sdkv2_datastore2",
        account_name=storage_account_name,
        filesystem=file_system_name
    )
    ml_client.datastores.create_or_update(adls_gen2_datastore)

@managed_identity_auth
def get_datastore(credentials, subscription_id, resource_group_name, datastore_name):
    ml_client = MLClient(subscription_id=subscription_id, 
                         resource_group_name=resource_group_name,
                         credential=credentials
    )
    datastore = ml_client.datastores.get(name=datastore_name)
    return datastore

create_datastore()
get_datastore(subscription_id=<subscription_id>, resource_group_name=<rg_name>, datastore_name=<datastore_name>)

Expected behavior 1.There should be Subscription ID and RG Name in the created datastore, currently its empty: image

  1. It should just return the existing datastore, currently returns: *** ValueError: No value for given attribute
github-actions[bot] commented 4 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

kristapratico commented 4 months ago

Thanks for your issue @obiii. Can you also share the version of azure-ai-ml that you are using? @azureml-github will take a look.

justkr commented 4 months ago

Hi, I have the same problem while migrating to to SDKv2.

ma11034 commented 4 months ago

Hi! Facing same issue

banibrata-de commented 4 months ago

Currently this is not supported, we added it to our feature request to triage it as a feature ask.