Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Could not find ADLS Gen2 Token when running as Job #67

Closed fvalencia12 closed 3 years ago

fvalencia12 commented 3 years ago

We are creating a CDM using the 0.19 version of the connector. We use Spark context to switch the context of the running system to use an application id. When running in normal mode (not job), the code works well, but when running as a Job, we get an error stating: Could not find ADLS Gen2 Token.

We are running DataBricks Runtime 6.4 (Spark 2.4.5). With a High Concurrency cluster and Passthrough enabled.

Any help of further information would be greatly appreciated.

BTW, we have tried to use the appId and secret parameters but we get an error on that as well stating that the clientId is null.

Regards Fabian

bissont commented 3 years ago

Can you provide a sample of what you mean by , "We use Spark context to switch the context of the running system to use an application id"?

What language are you using? Scala is only supported with a standard premium cluster. Does it work with a standard cluster?

I'm not aware of normal mode vs job in DB. Can you share a link? - For the appid/secrect param, does it work when using a notebook?

fvalencia12 commented 3 years ago

What I mean by switching context I mean this, we do this so notebooks that run as jobs can actually take advantage of Passthrough:

spark.conf.set("fs.azure.account.auth.type." + adlsGen2StorageName + ".dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type." + adlsGen2StorageName + ".dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id." + adlsGen2StorageName + ".dfs.core.windows.net", appId)
spark.conf.set("fs.azure.account.oauth2.client.secret." + adlsGen2StorageName + ".dfs.core.windows.net", secret)
spark.conf.set("fs.azure.account.oauth2.client.endpoint." + adlsGen2StorageName + ".dfs.core.windows.net", "https://login.microsoftonline.com/" + tenantId + "/oauth2/token")

We are using Python.

What I mean by job is the job in the job blade as defined here: Jobs — Databricks Documentationhttps://docs.databricks.com/jobs.html. This allows for notebooks to be scheduled. We need to run this notebook on a schedule basis.

[cid:image002.png@01D7092C.009477E0]

Fabian Valencia Information Technology William Blair The William Blair Building | 150 North Riverside Plaza, Chicago, Illinois 60606 Direct: +1-312-364-5132 fvalencia@williamblair.commailto:fvalencia@williamblair.com | williamblair.comhttp://www.williamblair.com/ Facebookhttps://www.facebook.com/WilliamBlairCo/ | LinkedInhttps://www.linkedin.com/company/166939 | YouTubehttps://www.youtube.com/channel/UC7CPnCaTlZcLxOF7WFwY0uw | Twitterhttps://twitter.com/WilliamBlair

From: bissont notifications@github.com Sent: Wednesday, February 17, 2021 10:18 PM To: Azure/spark-cdm-connector spark-cdm-connector@noreply.github.com Cc: Valencia, Fabian FValencia@williamblair.com; Author author@noreply.github.com Subject: Re: [Azure/spark-cdm-connector] Could not find ADLS Gen2 Token when running as Job (#67)

Can you provide a sample of what you mean by , "We use Spark context to switch the context of the running system to use an application id"?

What language are you using? Scala is only supported with a standard premium cluster. Does it work with a standard cluster?

I'm not aware of normal mode vs job in DB. Can you share a link? - For the appid/secrect param, does it work when using a notebook?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Azure/spark-cdm-connector/issues/67#issuecomment-781037770, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQZG2M46DAPFLL5LPZFTD63S7SIILANCNFSM4XYUYYUA.

srichetar commented 3 years ago

You dont need to set this configuration. The CDM connector is handling internally. Please run the job in same way as you do in notebook.

fvalencia12 commented 3 years ago

That doesn’t work because the jobs by default don’t run in a context. Meaning that they don’t run under a user or service principal. That is why when we run in interactive mode, they work as they use passthrough credentials as the user running the notebook.

When we specify app id and secret we get an error. The AppId and Secret parameters would help us not to have to switch the context, but they just don’t work.

Fabian Valencia Information Technology William Blair The William Blair Building | 150 North Riverside Plaza, Chicago, Illinois 60606 Direct: +1-312-364-5132 fvalencia@williamblair.commailto:fvalencia@williamblair.com | williamblair.comhttp://www.williamblair.com/ Facebookhttps://www.facebook.com/WilliamBlairCo/ | LinkedInhttps://www.linkedin.com/company/166939 | YouTubehttps://www.youtube.com/channel/UC7CPnCaTlZcLxOF7WFwY0uw | Twitterhttps://twitter.com/WilliamBlair

From: srichetar notifications@github.com Sent: Tuesday, February 23, 2021 9:38 PM To: Azure/spark-cdm-connector spark-cdm-connector@noreply.github.com Cc: Valencia, Fabian FValencia@williamblair.com; Author author@noreply.github.com Subject: Re: [Azure/spark-cdm-connector] Could not find ADLS Gen2 Token when running as Job (#67)

You dont need to set this configuration. The CDM connector is handling internally. Please run the job in same way as you do in notebook.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Azure/spark-cdm-connector/issues/67#issuecomment-784728012, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQZG2M622UZEYXPMAM3THMTTARYAXANCNFSM4XYUYYUA.

fvalencia12 commented 3 years ago

Any word on this? We have a ticket open with or Premier support and going in circles with the support folks. Is AppId and Secret supported with the .18 or .19 version?

srichetar commented 3 years ago

Hello @fvalencia12 , https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough#cluster-requirements, As per this link, Clusters enabled for credential passthrough do not support jobs

But you can continue using the CDM connector as a Job using app creds. You don't need to set the Oauth configuration. If you still face the issue, please send an email to asksparkcdm@microsoft.com and we can continue from there.

srichetar commented 3 years ago

Closing this as we have not heard back.