MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.28k stars 21.47k forks source link

Impact of using VNet Service Endpoints with Azure storage #53653

Closed ghost closed 4 years ago

ghost commented 4 years ago

This article discusses the impacts with Azure Storage and SQL (specifically Azure SQL Data Warehouse PolyBase) and provides a workaround for that scenario. However I'm discovering other (undocumented?) issues with no apparent work around when trying to do a bulk insert (also PolyBase oriented I assume) from SQL on an external data source configured against a blob storage.

In this scenario its not external tables but blob so this will not work:

CREATE EXTERNAL DATA SOURCE ext_datasource_with_abfss WITH (TYPE = hadoop, LOCATION = 'abfss://myfile@mystorageaccount.dfs.core.windows.net', CREDENTIAL = msi_cred);

And upon reviewing external data source documentation it states that it comes to BLOB's "the credential must be created using SHARED ACCESS SIGNATURE as the identity" so I am stuck in a Catch-22 at this point.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Mike-Ubezzi-MSFT commented 4 years ago

@migee Thank you for bringing this to our attention. To clarify your issue, are you trying to implement this functionality with a VNET in place or are you trying to use a Managed Identity (aka Managed Service Identity)? The classic PolyBase scenario is outlined in the following: Tutorial: Load the New York Taxicab dataset but this scenario does not include a VNET and MSI is not supported. This scenario leverages a blob storage account.

As per the Create External Data Source syntax, the prefix value determines the availability of leveraging MSI where location prefix abfss must be used. If you are using <container>@<storage_account>.blob.core.windows.net then location prefix must be wasb[s].

CREATE EXTERNAL DATA SOURCE WITH ( [ LOCATION = '://[:]' ] [ [ , ] CREDENTIAL = ] [ [ , ] TYPE = HADOOP ] [ ; ]

External Data Source Location prefix Location path
Azure Blob Storage wasb[s] <container>@<storage_account>.blob.core.windows.net
Azure Data Lake Store Gen 1 adl <storage_account>.azuredatalake.net
Azure Data Lake Store Gen 2 abfs[s] <container>@<storage_account>.dfs.core.windows.net

TYPE=HADOOP is for Synapse Analytics TYPE=BLOB_STORAGE is for Azure SQL Database

There are numerous configurations for creating an external data source but each is very specific to the database technology (SQL Database versus Data Warehouse) and the type of storage account being used. In the presence of a VNET, you are limited to what the document is stating.

Not sure if I answered your specific question but hoping to illustrate how to configure the external data source component. Please do let me know if you have additional questions.

Mike-Ubezzi-MSFT commented 4 years ago

@migee For this to work your data source Location Prefix and LOCATION must be: abfss://container>@<storage_account.dfs.core.windows.net

But for the above to work the TYPE value must be HADOOP and this type is only supported by SQL Data Warehouse.

The following location path equates to Azure Data Lake Store Gen2 and this storage service is the only one that supports AAD MSI.

If you moved your source data to an Azure Data Lake Store Gen2 storage account and tried running an experiment with TYPE = BLOB_STORAGE and LOCATION = abfss://container>@<storage_account.dfs.core.windows.net, it might work. My feeling is that the TYPE + LOCATION combination will not work. TYPE will need to be HADOOP and SQL Database (engine) does not support TYPE = HADOOP. It does support TYPE = BLOB_STORAGE.

You wouldn't need to move your source data to a new storage account as you can update your existing blob storage per the instructions but it is not reversible.

I hope this is more clear. I think I was trying too hard to explain how the variations between TYPE and LOCATION are limited by the database engine and the storage type.

Mike-Ubezzi-MSFT commented 4 years ago

@MikeMorawski I am glad things are more clear despite the outcome.