dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
851 stars 163 forks source link

BigQuery Materialized View will be recreated each time dataform project is run #1822

Open p13rr0m opened 2 months ago

p13rr0m commented 2 months ago

BigQuery Materialized View Issue

We have a very large table in BigQuery and have created a filtered, smaller materialized table for analysts. Each day new data gets added to the large table and subsequently to the small view as well.

We are using the dataform CLI to run the models. However, even though we haven't changed the materialized view, every time we run the dataform project, the materialized view will be recreated and we have to process the whole data of the large table again.

We would expect that the materialized view keeps the previously processed data.

This is how we create the materialized view:

config { 
  type: "view", 
  materialized: true,
  bigquery: {
    additionalOptions: {
        enable_refresh: "false"
    },
    partitionBy: "DATE(ingestion_time)",
    clusterBy: ["column_a"]
  }
}
SELECT
    *
FROM
    ${ref("large_table")}
WHERE
    column_b

Thanks for your help!