dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
395 stars 221 forks source link

[Feature] support inline session python submission method #1062

Open dkruh1 opened 2 months ago

dkruh1 commented 2 months ago

Is this your first time submitting a feature request?

Describe the feature

Summary: Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.

Description: Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.

In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.

Request: We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.

Describe alternatives you've considered

For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.

Who will this benefit?

All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.

Are you interested in contributing this feature?

yes - I'm preparing a pull request

Anything else?

No response

amychen1776 commented 1 month ago

@dkruh1 are you using the adapter with Databricks? If so, is there a reason why you're not using the dbt-databricks adapter?