airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.22k stars 3.92k forks source link

Destination BigQuery: connection check requires excessive database privileges #19973

Open philipherrmann opened 1 year ago

philipherrmann commented 1 year ago

With new versions of the BigQuery destination, the connector uses https://github.com/airbytehq/airbyte/blob/b1c50b8470850a6a29226783aeb85efa2884ac82/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L91 which requires that the service account used by airbyte has the ability to create and even delete (nondeterministically named, ie. datasetId + CHECK_TEST_DATASET_SUFFIX;) datasets. We see no chance to restrict IAM permissions in a way that allows the service account to delete only the required datasets and not allowing it to delete arbitrary datasets. Thus, we can't upgrade the BigQuery destination.

Before, check was trying to create exactly the dataset thats named in the destination configuration and only if it did not already exist. With the current version, even if GCS staging is activated, the destination checks if it can drop datasets.

We suggest to either

joelluijmes commented 1 year ago

Agreed, we opted to downgrade to 1.2.5 to circumvent this. We have configured a specific airbyte_landing dataset. In this dataset the Airbyte serviceaccount has full access (BigQuery Admin role), but no access to any other datasets. This is preferred from a least privilege perspective.

Another possible solution could be: an additional configuration option where users can opt in/out dataset creation/deletion access.

philipherrmann commented 1 year ago

Is there anything somebody can tell about a timeline for this security issue?