Azure / azure-sdk-for-cpp

This repository is for active development of the Azure SDK for C++. For consumers of the SDK we recommend visiting our versioned developer docs at https://azure.github.io/azure-sdk-for-cpp.
MIT License
170 stars 118 forks source link

DataLakeFileClient is initialized with a blob endpoint URL, regardless of whether the path provided uses a dfs URL format. #5729

Open sharmaplkt opened 1 week ago

sharmaplkt commented 1 week ago

In our scenario, we are attempting to instantiate a DataLakeFileClient using a storage URL formatted as https://adlsgen2account.dfs.core.windows.net/. However, the DataLakeFileClient consistently converts this URL to a blob URL (https://adlsgen2account.blob.core.windows.net/) before creating the object.

Our storage account is situated within a virtual network (VNet), and we have established a managed private endpoint from our workspace to the storage account. The default schema configured for this endpoint is dfs.

https://github.com/Azure/azure-sdk-for-cpp/blob/bbfc93fcaef31f4f67acb56ee749a94bf2a7ead3/sdk/storage/azure-storage-files-datalake/src/datalake_path_client.cpp#L82C33-L82C68

DataLakeFileClient is always initialized using the blob endpoint URL (https://adlsgen2account.blob.core.windows.net/*). Is there a workaround for this.

github-actions[bot] commented 1 week ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @EmmaZhu @Jinming-Hu @vinjiang.

Jinming-Hu commented 1 week ago

@sharmaplkt you should always instantiate your datalake clients with .dfs. endpoint. The clients will use .dfs endpoint for some operations and .blob. endpoint for the others. There's no way to modify or get around this. If you have a firewall, you should add both .dfs and .blob. endpoints into the rule set.

github-actions[bot] commented 1 week ago

Hi @sharmaplkt. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

sharmaplkt commented 5 days ago

@Jinming-Hu If you have a firewall, you should add both .dfs and .blob. endpoints into the rule set. Regarding this. Can you point me to the Azure documentation for adding both endpoints.

Jinming-Hu commented 5 days ago

@sharmaplkt There's no documentation for this. This is how storage SDK works, not just C++, but all languages (.Net, Java, etc.)

sharmaplkt commented 5 days ago

@Jinming-Hu For Java, We are able to connect to storage account without adding endpoint for .blob in firewall rules. But we need to add this specifically only for C++.

Jinming-Hu commented 5 days ago

@sharmaplkt That's kind of surprising to me. I'm checking Java's source code. Java SDK also transforms datalake operations into blobs. For example, GetPathProperties is transformed to GetBlobProperties https://github.com/Azure/azure-sdk-for-java/blob/2d5157baab234be4ebf990b4f8a1ec1ec1134992/sdk/storage/azure-storage-file-datalake/src/main/java/com/azure/storage/file/datalake/DataLakePathClient.java#L1522C5-L1529C6