dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
223 stars 157 forks source link

enable overriding the dataproc project for python models #1365

Open matt-winkler opened 1 month ago

matt-winkler commented 1 month ago

resolves #1364 docs dbt-labs/docs.getdbt.com/#

Problem

Currently, dbt-bigquery does not support overriding the execution_project for python models. Adding this will enable users to better balance their compute allocations at scale.

Solution

Includes logic to enable setting a dataproc_project configuration. Otherwise just use execution_project

Checklist

github-actions[bot] commented 1 month ago

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

matt-winkler commented 1 month ago

@dbt-labs/adapters Hi Team, looking for philosophical comment on this ahead of doing additional testing. Thank you!

colin-rogers-dbt commented 1 month ago

@matt-winkler no philosophical qualms on this approach

matt-winkler commented 1 month ago

@colin-rogers-dbt to give an idea of how this works - it will override the GCP project used for .py models only. Can you please comment on what other testing would be sufficient to prove this works? I'm not sure writing an integration test is the right approach for this since it involves a GCP project-level override, but open to suggestions on approach if you want to go that direction.

This failing example from my local illustrates:

21:51:55 1 of 2 START sql view model dbt_mwinkler_core_dev.view_1 ....................... [RUN] 21:51:56 1 of 2 OK created sql view model dbt_mwinkler_core_dev.view_1 .................. [CREATE VIEW (0 processed) in 1.22s] 21:51:56 2 of 2 START python table model dbt_mwinkler_core_dev.py_model ................. [RUN] 21:51:57 Unhandled error while executing target/run/jaffle_project/models/py_model.py 403 Permission denied on resource project test-some-other-project. [links { description: "Google developers console" url: "https://console.developers.google.com" } , reason: "CONSUMER_INVALID" domain: "googleapis.com" metadata { key: "service" value: "dataproc.googleapis.com" } metadata { key: "consumer" value: "projects/test-some-other-project" } ]

My profiles.yml looks like this:

bigquery_core_dev:
  target: dev
  outputs:
    dev:
      dataset: dbt_mwinkler_core_dev
      keyfile: <redacted>
      method: service-account
      project: <redacted>
      threads: 4
      type: bigquery
      gcs_bucket: matt-w-python-demo
      dataproc_cluster_name: matt-w-python-demo
      dataproc_region: us-west1
      dataproc_project: test-some-other-project
colin-rogers-dbt commented 1 month ago

@matt-winkler we should set up another GCP project for us to use here (it's what we do in similar situations in other adapters) and then run a python model with this configuration setting. I can work with you to get another project set up