dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
214 stars 149 forks source link

[Feature] Support running SQL models on Google Cloud Dataproc Serverless #1353

Open gddezero opened 1 week ago

gddezero commented 1 week ago

Is this your first time submitting a feature request?

Describe the feature

Context

Google Cloud Dataproc Serverless lets you run Spark workloads without requiring you to provision and manage your own Dataproc cluster. Use the Google Cloud console, Google Cloud CLI, or Dataproc API to submit a batch workload to the Dataproc Serverless service. The service will run the workload on a managed compute infrastructure, autoscaling resources as needed.

Dataproc Serverless is widely used for GCP customers to build data pipelines. A typical use case is submitting Spark SQL jobs to Dataproc Serverless to transform data and build data warehouse.

Current Status

dbt only supports running Python models on Dataproc Serverless as a companion service of BigQuery https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc

Request

Support running SQL models on Dataproc Serverless

Describe alternatives you've considered

No response

Who will this benefit?

Customers using Google Cloud

Are you interested in contributing this feature?

No response

Anything else?

No response

amychen1776 commented 1 week ago

Hello @gddezero Could you provide more context about why you prefer Datapoc for SQL rather than directly on BQ?