dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
215 stars 149 forks source link

[ADAP-945] [Bug] `submission_method` from dbt profile not being applied to dbt Python models #967

Open gbmarc1 opened 12 months ago

gbmarc1 commented 12 months ago

Is this a new bug in dbt-bigquery?

Current Behavior

I have the following profile. I want a job to be created in the provided cluster name but it always end up as a serverless batch.

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

This is the model. If I uncomment the dbt.config it works properly. But I want this config in the profile not in the model itself.

def model(dbt, session):
    # dbt.config(
    #     submission_method="cluster",
    #     dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    # )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Expected Behavior

The profile config is respected and the job is executed in the cluster.

Steps To Reproduce

dbt run

Relevant log output

dbt run --models nsfw
15:50:47  Running with dbt=1.6.6
15:50:47  Registered adapter: bigquery=1.6.7
15:50:47  Unable to do partial parsing because profile has changed
15:50:48  Found 5 models, 12 tests, 7 sources, 0 exposures, 0 metrics, 661 macros, 0 groups, 0 semantic models
15:50:48  
15:50:50  Concurrency: 2 threads (target='dev')
15:50:50  
15:50:50  1 of 2 START sql table model mab_nsfw.multi_label_v1 ........................... [RUN]
15:50:50  2 of 2 START python table model mab_nsfw.multi_label_v2 ........................ [RUN]
15:50:54  1 of 2 OK created sql table model mab_nsfw.multi_label_v1 ...................... [CREATE TABLE (84.1k rows, 10.5 MiB processed) in 4.40s]

Environment

- OS: macos
- Python: 3.11.1
- dbt-core: 1.6.6
- dbt-bigquery: 1.6.7

Additional Context

No response

dbeatty10 commented 12 months ago

Thanks for reporting this @gbmarc1

It sounds like this didn't work for you:

def model(dbt, session):
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

But this did work:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

To help troubleshoot

Did you happen to try either of these as well? This could help nail down where the missing piece(s) might be.

Configuring submission_method only:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Or configuring dataproc_cluster_name only:

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df
gbmarc1 commented 12 months ago

Hello, Thanks for looking at this! :)

Seems the profile's submission_method get ignored.

dbeatty10 commented 12 months ago

Thanks @gbmarc1 -- that gives us the info we need 👍

Acceptance criteria

As noted in the original issue, dbt should use the cluster submission method (rather than serverless) when using the following project files:

profiles.yml

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

models/my_model

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Relevant code