Azure / spark-cdm-connector

MIT License
76 stars 33 forks source link

Databricks batch mode - AzureCredentialNotFoundException: Could not find ADLS Gen2 Token. #108

Closed dynarch closed 2 years ago

dynarch commented 2 years ago

Hello,

I have a problem with a CDM connector running in a batch mode (workflow). When running manually it works with no error. When running as a scheduled task this part of code

entity_df = (spark.read.format("com.microsoft.cdm")
    .option("storage", cdsStorageAccountName)
    .option("manifestPath", cdsContainer + manifest_path)
    .option("entity", table_name)
    .load())
display(entity_df)

throws an error: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

_--- Py4JJavaError Traceback (most recent call last)

in 2 .option("storage", cdsStorageAccountName) 3 .option("manifestPath", cdsContainer + manifest_path) ----> 4 .option("entity", table_name) 5 .load()) 6 display(entity_df)_ I have checked mounts and they are working normally (using OAuth) configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": "", "fs.azure.account.oauth2.client.secret": "", "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com//oauth2/token", "fs.azure.createRemoteFileSystemDuringInitialization": "true"} dbutils.fs.mount( source = "URL", mount_point = "/mnt/", extra_configs = configs) Cluster: Spark 2.4.5, Scala 2.11 2-8 Workers28-112 GB Memory8-32 Cores 1 Driver14 GB Memory, 4 Cores Runtime6.4.x-esr-scala2.11 **Option "Enable credential passthrough for user-level data access" is activated.** What could be the reason of this? P.S: e-mail address asksparkcdm@microsoft.com is not accessible: The **aniketsteam** group only accepts messages from people who are within their organization or on their allowed senders list, and your email address is not on the list.
kthejoker commented 2 years ago

You can't use credential passthrough in non-interactive mode (e.g. in a scheduled task), and that takes precedence over SP credentials provided in Spark config.

dynarch commented 2 years ago

I have received a reply from a Databricks team, they have informed that a credentials pass-through cannot be used in a scheduled tasks, so the problem is in a Databricks, not library.