dataform-co / dataform-web-tracking

Dataform is a collaborative data modelling platform that enables analysts and engineers to manage complex data models in SQL
https://dataform.co
MIT License
2 stars 0 forks source link

DROP table needed #318

Closed frans-snyders closed 2 years ago

frans-snyders commented 2 years ago

If a partitioning change is made on a table (or incremental table), then the table can't be updated via dataform, and results in an error. Attempting to re-create the table or doing a full refresh on the table results in the same error.

The error can only be fixed by dropping the table - either by a superuser in BigQuery (permissions to drop in production), or by creating a drop operation per table in dataform and pushing to production. For our environment, only the second is viable, which results in each created table needing a separate operation containing "DROP TABLE $(resolve(???))". The added risk is if a user does a "Run entire project" in production - which will result in all tables being dropped in production.

Suggested fix:

Error Replication:

  1. Add a new file - table as follows:
    
    config {
    type: "table",
    uniqueKey: ["id"],
    bigquery: {
    partitionBy: "date_table"
    },
    }

select current_date() as date_table, 1 as id


2. run "Create Table" (default options)

3. change script to the following:

config { type: "table", uniqueKey: ["id"], bigquery: { partitionBy: "date_table_2" }, }

select current_date() as date_table, 1 as id, current_date() as date_table_2



4. run "Create Table" (default options)
error occurs
5. run "Create Table" (with "Run with full refresh")
error occurs

`bigquery error: Cannot replace a table with a different partitioning spec. Instead, DROP the table, and then recreate it. New partitioning spec is interval(type:day,field:date_table_2) and existing spec is interval(type:day,field:date_table)`
frans-snyders commented 2 years ago

Handling this with a "smart" drop script