dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.7k stars 180 forks source link

Customer Managed Encryption Keys (CMEK) for GCP sources/destinations #1975

Open ldnicolasmay opened 1 month ago

ldnicolasmay commented 1 month ago

Feature description

Similar to how dbt makes it possible to add config that defines which customer-managed key to use for data encryption in BigQuery (see here), it'd be great if this kind of config were available in dlt for GCP sources & destinations, i.e., GCS and BigQuery.

Here's the relevant dbt documentation: https://docs.getdbt.com/reference/resource-configs/bigquery-configs#managing-kms-encryption Here's the relevant GCS documentation: https://cloud.google.com/storage/docs/encryption/customer-managed-keys Here's the relevant BigQuery documentation: https://cloud.google.com/bigquery/docs/encryption-at-rest

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

This feature would make it possible to use dlt if that user needs to encrypt/decrypt GCS or BigQuery data with their own key.

Proposed solution

Similar to dbt's YAML config (see here), it'd be great if we could just point to a key. The service account that has permissions to read/write GCS or BigQuery data would need permissions to retrieve and use the key defined in dlt config.

Related issues

No response

VioletM commented 1 month ago

dbt implementation: https://github.com/dbt-labs/dbt-bigquery/blob/d4be89a3840c400a963c8d24e82cbbac608290a2/dbt/adapters/bigquery/impl.py#L109