dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
851 stars 163 forks source link

Fix `uniqueKey` assertions for views, incremental tables #1836

Closed bmagyarkuti closed 1 month ago

bmagyarkuti commented 2 months ago

The Bug

This PR fixes a bug where uniqueKey assertions on views and incremental tables caused dataform run to exit with a query error. uniqueKey assertions on tables, as well as all uniqueKeys assertions still worked. Affected versions: 3.0.1, 3.0.2. I don't know whether executions on GCP were also affected.

Steps to reproduce

1) Create an empty dataform project using version 3.0.1 or 3.0.2 of @dataform/core. 2) Add the following definition:

config {
  type: "view",
  name: "my table",
  assertions: {
    uniqueKey: ["id"]
  }
}

SELECT 1 as id

3) Execute dataform run. Observe the following error message:

Table created:  dataform.my table [view]
Assertion failed:  dataform_assertions.dataform_my table_assertions_uniqueKey_0
  >
  >       create or replace view `mbarna-mongo-to-bq-dataflow.dataform_assertions.dataform_my table_assertions_uniqueKey_0` as
  > SELECT
  >   *
  > FROM (
  >   SELECT
  >     TableAssertionsConfig,
  >     COUNT(1) AS index_row_count
  >   FROM `mbarna-mongo-to-bq-dataflow.dataform.my table`
  >   GROUP BY TableAssertionsConfig
  >   ) AS data
  > WHERE index_row_count > 1
  >
  bigquery error: Unrecognized name: TableAssertionsConfig at [7:5]

The Fix

I don't fully understand what happened. The query complains about an undefined column TableAssertionsConfig, and the newly refactored implementations for view and incremental_table contain this hardcoded string for uniqueKey assertions. I'm not sure what the hardcoded string is meant to be referring to. Just replacing the hardcoded string with the user-supplied column names fixes the issue.

The Tests

The suite in core/main_test.ts tested assertions for views, tables and incremental_tables, for the following assertions: uniqueKeys, nonNull, rowConditions. uniqueKey assertions were untested, I assume because they're incompatible with a uniqueKeys assertion. I extended the test suite to also include tests for uniqueKey assertions, for all three tested table kinds. I worked around the incompatibility by introducing a second sample file. Apart from the different assertions block, the second file's contents are identical to the contents of the first file.

This has confirmed the bug, as well as the fix.

I nominate @Ekrekr as reviewer.

Ekrekr commented 1 month ago

Thank you so much for this fix! It would have been approved earlier, but there's been a lot on.

It looks good - I'll merge main in with this and make sure the remote tests pass before merging.

Fixes https://github.com/dataform-co/dataform/issues/1835 Fixes https://github.com/dataform-co/dataform/issues/1804 Fixes b/371514877