dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
854 stars 166 forks source link

Bug: No way to add KMS Key to the dataform queries [Administrator requires that you specify encryption key for queries] #1793

Closed arkadioz closed 3 months ago

arkadioz commented 4 months ago

Good day, a policy was applied to make mandatory the use of KMS keys for everything including queries, I have been searching in the dataform documentation and can not find anything about it, I have the repository encrypted with a custom kms key and also put the kms_key_nameparameter as the documentation suggests at the config block: https://cloud.google.com/dataform/docs/reference/dataform-core-reference (in the IBigQueryOptionsthere is a parameter called additionalOptionsthat has the kms_key_nameoption)

But all this did not work, I always get this error message:

"Your administrator requires that you specify an encryption key for queries in project project-id. See https://cloud.google.com/bigquery/docs/customer-managed-encryption#services_constraint for more info."

So I think this is a bug or missing functionality in dataform to handle this scenario?

Even a simple script like this is failing to run or execute manually or through dataform workflow:

config {
    name: "name_of_table",
    type: "incremental",
    protected: true,
    schema: schema_value,
    database: database_value,
    columns: {
        starting_hour: "Start hour",
        ending_hour: "End hour"
    },
    tags: ["monthly"],
    bigquery: {
        additionalOptions: {
            "kms_key_name": "projects/your-project-id/locations/your-key-location/keyRings/your-key-ring/cryptoKeys/your-key-name"
        }
    }

}

SELECT
  1 AS starting_hour,
  2 AS ending_hour

There is also this documentation about encryption with dataform https://cloud.google.com/dataform/docs/cmek, but as I said before even with the repository being encrypted using custom kms key, even if it is the same used at the bigquery datasets, it fails...

Also the service account that is being used has the encrypt decrypt role for the particular key, idk what else to try please help.

I think that the following at dataform restrictions means that there is no way to handle this scenario? https://cloud.google.com/dataform/docs/cmek#restrictions image

Ekrekr commented 3 months ago

This is technically a GCP issue - Dataform Core is doing what it should be, which is passing the API option to BigQuery.

Our docs for KMS are at https://cloud.google.com/dataform/docs/cmek.

If that doesn't help, I'd recommend going via the GCP issue tracker https://issuetracker.google.com/savedsearches/6274893 from https://cloud.google.com/support/docs/issue-trackers, and using your org's GCP customer engineer to help with configuring the set up.