Open leo-schick opened 7 months ago
Hi @leo-schick, could you please give a example how to run SQL models on Databricks job Cluster? Thanks!
@gaoshihang I wrote now an article on Medium How to run dbt on a Databricks job cluster
@leo-schick feel free to upvote our feature request for a more native way to run dbt on job clusters https://ideas.databricks.com/ideas/DBE-I-1415
@moritzmeister i do not have access to this page
@leo-schick - I'm very interested in this topic right now, also. I have a case where we have a number of models that need to be run for a given task, and until now we've just been eating the cost of running an all-purpose cluster and directing model exec to there, but I'm trying to switch to job clusters right now. I have a test case working using the submission_method: job_cluster
config - but of course that's triggering a cluster per model as you mentioned.
I tried setting up a shared job cluster, capturing the cluster id and passing it through to the models to use - but hit a bizarre access issue where it tells me (despite my account having 'can manage' on the cluster in question):
Error creating an execution context.
b'{"error":"WorkspaceAclExceptions.WorkspacePermissionDeniedException: my.account@my.com does not have Attach permissions on 0613-041925-akezdk3x. Please contact the owner or an administrator for access."}\n'
Which seems like a pretty misleading response... any thoughts?
@moritzmeister - likewise, can you share access to that link? I'll happily upvote also!
@chrismbeach have you tried using the approach I mentioned in my Medium post? How to run dbt on a Databricks job cluster
You can find my helper Notebooks here: https://github.com/leo-schick/databricks-dbt-helper
Thanks @leo-schick - I've not - since it's dbt python models I need to run :( Per your summary in https://github.com/databricks/dbt-databricks/issues/586 that doesn't appear viable atm, due to (seemingly unreasonable) access restrictions?
Hey @leo-schick, hey @chrismbeach, you should get access to the page if you have access to the Databricks support. I think you need a support contract with them for that.
I also talked with the Databricks support about this, this was their response:
There is currently no plan to be able to run Python dbt models on SQL warehouses. For such scenarios, if there is any use case that cannot be run on SQL warehouses, there is an option to run it on the all-purpose clusters."
To summarise - There is no plan/roadmap for running python/pyspark dbt models on SQL warehouses and to give an option of using Job Clusters with dbt models. The reason, as I mentioned earlier, is "dbt-databricks is optimized to work best against Databricks SQL warehouses as local development is typically carried out by users using Databricks SQL", and there is currently no plan to run python/pyspark dbt models on it.
Not really satisfying.
Describe the feature
It is possible to run dbt SQL models inside a job cluster when:
I would like to see:
Describe alternatives you've considered
I did several tests with the token based authentication but it looks like that in job clusters have another spark endpoint. Token based authentication does not work on a job cluster.
Something to note
Python models currently do not work with this approach.
Who will this benefit?
The Databricks license costs are reduced, because no general purpose cluster is necessary to run dbt inside databricks.