dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
224 stars 157 forks source link

[Bug] 'dbt docs generate' creates catalog for the relations not used in the models #1316

Open maxmullerfitu opened 3 months ago

maxmullerfitu commented 3 months ago

Is this a new bug in dbt-bigquery?

Current Behavior

Our project contains sources in multiple GCP projects. The source table for the given model changes at runtime and this is parameterized via vars. when the dbt docs generate command is called, the fault is generated: Encountered an error while generating catalog: Database Error Access Denied: Table xxxxxxxxx-gold-dev:xxxxx_gold.INFORMATION_SCHEMA.COLUMNS: User does not have permission to query table Service account cannot access the table in the other project. According to the logs, dbt docs generate tries to create catalog for all the sources, where as before, it was only creating the catalog for the relations used in the models. To avoid this issue we had to remove the sources that refer to the multiple projects. This behavior started after we upgraded our composer environment and dbt-bigquery version changed v1.5.4 -> v1.8.2

Expected Behavior

It is exected that 'dbt docs generate' crates documentation for the relations used in models and sources could contain tables from multiple projects.

Steps To Reproduce

  1. Create Sources in several projects:

    • name:source1 project: project1 tables:
      • name: table1
    • name: source2 project: project2 schema: schema2 tables:
      • name: table2
  2. Alternate the sources in the ref(): `FROM {{ ref("some_model") }} t1 JOIN {% if var('env') == 'dev' %} {{ source("source1", "table1") }} AS t2 {% else %} {{ source("source2", "table2") }} AS t2 {% endif %} ON t1.id = t2.id

3 Set {%- set env = "dev" -%} - source1 will be used in the model Run 'dbt docs generate'

  1. Observe dbt logs to confirm that the query exists to retrieve information from source2.information_schema

Relevant log output

for dbt-bigquery 1.5.4: the source2 is skipped
20:43:05.556313 [debug] [MainThread]: Acquiring new bigquery connection 'generate_catalog'
20:43:05.557479 [info ] [MainThread]: Building catalog
20:43:05.560116 [debug] [MainThread]: Opening a new connection, currently in state init
20:43:07.064956 [debug] [MainThread]: BigQuery adapter: Skipping catalog for xxxxxxxxxxx - schema does not exist
20:43:07.066819 [debug] [ThreadPool]: Acquiring new bigquery connection 'yyyyyyyyy'
20:43:07.073633 [debug] [ThreadPool]: Acquiring new bigquery connection 'zzzzzzzzz'

for dbt-bigquery 1.5.4: the source2 is not skipped:
14:20:51.521164 [debug] [MainThread]: Acquiring new bigquery connection 'generate_catalog'
14:20:51.524088 [info ] [MainThread]: Building catalog
14:20:51.538783 [debug] [ThreadPool]: Acquiring new bigquery connection 'yyyyyyyyy'
14:20:51.540263 [debug] [ThreadPool]: Acquiring new bigquery connection 'zzzzzzzzz'

Environment

- OS:GCP Composer image composer-2.8.5-airflow-2.7.3

- Python:3.11.8
- dbt-core:1.8.3
- dbt-bigquery:1.8.2

Additional Context

No response

jeremychia commented 3 months ago

we are experiencing the same error as well.