Closed ghost closed 4 years ago
@migee Thank you for bringing this to our attention. To clarify your issue, are you trying to implement this functionality with a VNET in place or are you trying to use a Managed Identity (aka Managed Service Identity)? The classic PolyBase scenario is outlined in the following: Tutorial: Load the New York Taxicab dataset but this scenario does not include a VNET and MSI is not supported. This scenario leverages a blob storage account.
As per the Create External Data Source syntax, the prefix value determines the availability of leveraging MSI where location prefix abfss
must be used. If you are using <container>@<storage_account>.blob.core.windows.net
then location prefix must be wasb[s]
.
CREATE EXTERNAL DATA SOURCE
WITH ( [ LOCATION = ' :// [: ]' ] [ [ , ] CREDENTIAL = ] [ [ , ] TYPE = HADOOP ] [ ; ]
External Data Source | Location prefix | Location path |
---|---|---|
Azure Blob Storage | wasb[s] | <container>@<storage_account>.blob.core.windows.net |
Azure Data Lake Store Gen 1 | adl | <storage_account>.azuredatalake.net |
Azure Data Lake Store Gen 2 | abfs[s] | <container>@<storage_account>.dfs.core.windows.net |
TYPE=HADOOP is for Synapse Analytics TYPE=BLOB_STORAGE is for Azure SQL Database
There are numerous configurations for creating an external data source but each is very specific to the database technology (SQL Database versus Data Warehouse) and the type of storage account being used. In the presence of a VNET, you are limited to what the document is stating.
Not sure if I answered your specific question but hoping to illustrate how to configure the external data source component. Please do let me know if you have additional questions.
@migee For this to work your data source Location Prefix and LOCATION must be: abfss://container>@<storage_account.dfs.core.windows.net
But for the above to work the TYPE value must be HADOOP and this type is only supported by SQL Data Warehouse.
The following location path equates to Azure Data Lake Store Gen2 and this storage service is the only one that supports AAD MSI.
If you moved your source data to an Azure Data Lake Store Gen2 storage account and tried running an experiment with TYPE = BLOB_STORAGE and LOCATION = abfss://container>@<storage_account.dfs.core.windows.net, it might work. My feeling is that the TYPE + LOCATION combination will not work. TYPE will need to be HADOOP and SQL Database (engine) does not support TYPE = HADOOP. It does support TYPE = BLOB_STORAGE.
You wouldn't need to move your source data to a new storage account as you can update your existing blob storage per the instructions but it is not reversible.
I hope this is more clear. I think I was trying too hard to explain how the variations between TYPE and LOCATION are limited by the database engine and the storage type.
@MikeMorawski I am glad things are more clear despite the outcome.
This article discusses the impacts with Azure Storage and SQL (specifically Azure SQL Data Warehouse PolyBase) and provides a workaround for that scenario. However I'm discovering other (undocumented?) issues with no apparent work around when trying to do a bulk insert (also PolyBase oriented I assume) from SQL on an external data source configured against a blob storage.
In this scenario its not external tables but blob so this will not work:
CREATE EXTERNAL DATA SOURCE ext_datasource_with_abfss WITH (TYPE = hadoop, LOCATION = 'abfss://myfile@mystorageaccount.dfs.core.windows.net', CREDENTIAL = msi_cred);
And upon reviewing external data source documentation it states that it comes to BLOB's "the credential must be created using SHARED ACCESS SIGNATURE as the identity" so I am stuck in a Catch-22 at this point.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.