dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
217 stars 153 forks source link

[Feature] Ability to add labels to each job (not table/model) based on profiles.yml property #1366

Open moseleyi opened 3 weeks ago

moseleyi commented 3 weeks ago

Is this your first time submitting a feature request?

Describe the feature

I would like to use one Service Account connection to BigQuery. The problem with this however is that the logs would not show which person actually runs dbt. dbt already adds dbt_invocation_id to all queries as labels and I would like to be able to configure a label in profiles.yml, that is also added to all queries.

def raw_execute(
        self,
        sql,
        use_legacy_sql=False,
        limit: Optional[int] = None,
        dry_run: bool = False,
    ):
        conn = self.get_thread_connection()
        client = conn.handle

        fire_event(SQLQuery(conn_name=conn.name, sql=sql, node_info=get_node_info()))

        labels = self.get_labels_from_query_comment()

        labels["dbt_invocation_id"] = get_invocation_id()

        job_params = {
            "use_legacy_sql": use_legacy_sql,
            "labels": labels,
            "dry_run": dry_run,
        }

I found this code when labels are added. Imaging we add labels property in profiles:

project:
  method: service_account
  threads: 4
  ...
  labels:
    dbt_user: "somebody"

Then in Log Explorer in GCP I can differentiate between people if this were added to the labels. I wouldn't have to use ADC or other short-lived credentials, or create separate service account for each user.

Describe alternatives you've considered

Creating a fork of the bigquery connector and adding it by myself.

Who will this benefit?

Anybody using bigquery with service account connection that would like to still have user-level details in the logs or add any other labels to all queries

Are you interested in contributing this feature?

Yes

Anything else?

No response

amychen1776 commented 2 days ago

Hello @moseleyi, thank you for opening up this issue! I'm curious to ask you why you would want to use only one service account to auth into BQ? This does not align with our best practices especially for security.

moseleyi commented 2 days ago

Authentication would still be using ADC but the permissions are doe via Service Account. This is because GCP allows to use user accounts for authentication and service accounts for permission - it's called service account impersonation. It's a bridge between having multiple user accounts or one service account. The first one can become clunky if you have to set permissions for each user, the second, meaning one service account, is not compliant with financial regulators.

I want to add labels that would show up in GCP logs what is the user name running the queries. Unfortunately with ADC + Impersonation, it's the service account email that shows up in logs .

https://cloud.google.com/docs/authentication/use-service-account-impersonation

amychen1776 commented 23 hours ago

Thank you for that explanation! That was very helpful. I will put this into consideration but will not be able to prioritize this for now