dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
853 stars 165 forks source link

Cannot set CLUSTER or PARTITION options on BigQuery materialized views. #1809

Closed ChrisBlom closed 3 months ago

ChrisBlom commented 3 months ago

Currently it is not possible to set CLUSTER BY or PARTITION BY options on a materialized view in dataform, even though this is supported

One can set cluster / partition options on materialized views when using SQL:

CREATE OR REPLACE MATERIALIZED VIEW
  `my_project.my_dataset.my_view`
CLUSTER BY customer_id 
PARTITION BY DATE(transaction_time)
AS (
 ...
 )

See: https://cloud.google.com/bigquery/docs/materialized-views-create#cluster_materialized_views https://cloud.google.com/bigquery/docs/materialized-views-create#partitioned_materialized_views

One would expect that the same format used to set clustering and partition options for tables is usable to define materalized views with clustering and partitioning,

config {
    type: "view",
    materialized: true,
    description: "Deduplicated pictures",
    bigquery: {
        partitionBy: "DATE(transaction_time)",
        clusterBy: ["customer_id"],
    },
}

select * from ...

But this results in a compilation error:

partitionBy/clusterBy/requirePartitionFilter/partitionExpirationDays are not valid for BigQuery views; they are only valid for tables

Would it be possible to add support for CLUSTER BY and PARTITION BY to materialized views?

Ceridan commented 3 months ago

Hi @ChrisBlom, could you please share what version of @dataform/cli you are using?

Seems like the issue was fixed in #1660. Could you try updating the CLI version and let us know if that helps?

ChrisBlom commented 3 months ago

I was using 2.8.3, can confirm that it work in 2.9.0 so the issue can be closed