dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
406 stars 228 forks source link

[Feature] Support running SQL models on Google Cloud Dataproc Serverless #1131

Open gddezero opened 4 weeks ago

gddezero commented 4 weeks ago

Is this your first time submitting a feature request?

Describe the feature

Context

Google Cloud Dataproc Serverless lets you run Spark workloads without requiring you to provision and manage your own Dataproc cluster. Use the Google Cloud console, Google Cloud CLI, or Dataproc API to submit a batch workload to the Dataproc Serverless service. The service will run the workload on a managed compute infrastructure, autoscaling resources as needed.

Dataproc Serverless is widely used for GCP customers to build data pipelines. A typical use case is submitting Spark SQL jobs to Dataproc Serverless to transform data and build data warehouse.

Current Status

dbt only supports submitting SQL models using Spark thrift server. User need to deploy a Dataproc Cluster, start thrift server and manage the infrastructures underneath.

Request

Support running SQL models on Dataproc Serverless.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response