dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
222 stars 157 forks source link

[CT-162] Upgrade from the __tables__ construct to the information_schema.tables construct #113

Open Fraser-Isbester opened 2 years ago

Fraser-Isbester commented 2 years ago

Describe the feature

The use of the [project_id].[dataset_id].tables has been deprecated in favor of [project_id].[dataset_id].information_schema.tables. This is relevant because it is not possible to access the former with metadata only permissions (it requires getData permissions). This would allow secure doc generation and schema-only tests to be run in a lower privilege environment.

Describe alternatives you've considered

Additional context

None.

Who will this benefit?

Anyone in high-security or high-compliance environments who want to utilize external dbt actors for certain tasks (github actions, for instance.)

Are you interested in contributing this feature?

Sure!

VersusFacit commented 2 years ago

@Fraser-Isbester Always enjoy a good security-conscious patch. Just to make sure I understand, is it correct that the roles in Bigquery have changed and using tables without information_schema means having to give user too much access permissions?

As for changing the code, I've got one reference:

        concat(project_id, '.', dataset_id, '.', table_id) as relation_id,

dbt/include/bigquery/macros/catalog.sql

We'd have to look at other references to these fields in the codebase to ensure we've got good coverage. We always like a test on these.

You still interested in contributing?

github-actions[bot] commented 2 years ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

hassan-mention-me commented 2 years ago

PR to fix this issue - https://github.com/dbt-labs/dbt-bigquery/pull/238

sungchun12 commented 2 years ago

Current macro workaround for this: https://github.com/GeneralMills/gmi_common_dbt_utils/blob/main/macros/bq_catalog.sql

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

dbeatty10 commented 1 year ago

Re-opening since https://github.com/dbt-labs/dbt-bigquery/issues/897 is about the same thing.

Luiscri commented 1 year ago

+1 on this. In my organization we have different teams being responsible for different data sources. I have hit the situation where I've been given access to some tables of a dataset instead of the full dataset, because there are other tables on it which I'm not meant to access. When I try to build the docs for my dbt project, I get an error because I don't have getData permission on the dataset and I can't access the __TABLES__ deprecated table. If INFORMATION_SCHEMA was used instead, this won't be a problem because I could be given the metadataViewer role on the dataset without compromising the data contained on its tables.

nathangriffiths-cdx commented 1 year ago

We have also just run into this issue - our Github Actions service account has the "BigQuery Metadata Viewer" role but dbt docs generate still fails with permissions errors - apparently due to the references to __TABLES__ instead of INFORMATION_SCHEMA.TABLES.

marappel commented 3 days ago

We also have this issue. We want to have IAM on table level and this makes the dbt docs generate command fail as TABLES needs getData on dataset level.