Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Cannot be used in Azure China #72

Open simonzhaoms opened 3 years ago

simonzhaoms commented 3 years ago

Spark CDM Connector doesn't use the correct fs.azure.account.oauth2.client.endpoint (which should be https://login.partner.microsoftonline.cn) for hostname like *.dfs.core.chinacloudapi.cn in Azure China, therefore it cannot find the manifest.cdm.json file. For Azure Global, the endpoint is https://login.microsoftonline.com/. The configuration is set in the class com.microsoft.cdm.utils.SerializedABFSHadoopConf.

srichetar commented 3 years ago

Hi @simon Can you try to use the token based access control ? Do you see the same issues?

bguidinger commented 3 years ago

@simonzhaoms We had a similar issue trying to connect to Azure Government. The fix for us was to use the token based access control (i.e. managed identity) as @srichetar mentioned. The piece that's missing from the documentation is that your user needs the Storage Blob Data Contributor role on the storage account, even if your user has the regular Owner or Contributor roles.