Open dali-rmw opened 2 years ago
hi @dali-rmw! i use dataform in BigQuery/GCP too.
As @kolina said, when you define the config snippet as table
, BigQuery always understands this as a writeDisposition=WRITE_TRUNCATE
param.
If you want to append data to an existing table, you have to use theincremental
type in the config snippet. This would be the same as writeDisposition=WRITE_APPEND
param.
Best regards.
I have a scenario where WRITE_TRUNCATE and config{type:"table"} are not exactly the same thing.
I have a BQ materialized view which uses a particular table table_A as one of its sources. Table_A gets its data overnight via a script which uses write_truncate to overwrite the data in the table with new data.
Now, when I try that with Dataform config{type: "table"} it actually does a REPLACE of Table_A.
The problem is that materialized views do not like to have their source tables changed. If you try to select from that materialized view you get Materialized view project_A.schema_A.mat_view references table project_A.schema_A.table_A which was deleted and recreated. The view must be deleted and recreated as well.
While I could drop and create the materialized view, this particular one is expensive to run as a one off - we end up not really getting the benefits of the materialized view.
@joajen This is an interesting use case I'm curious would the expected output be that dataform create the materialized view and pass along options similar to these to BigQuery api
https://cloud.google.com/bigquery/docs/materialized-views-manage#refresh
or are you talking about that table_A
upstream dependency of the mat view and would like dataform to write_trunc that table instead of create or replace
it?
@andres-lowrie
Thank you. The second point. I want the upstream dependency (ie base table table_A
) to be write_truncate.
BQ has limitations on the materialized views - if you delete and recreate a base table the materialized view refresh will fail. See the last point at https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations
Hello,
As mentioned by @joajen, the behavior of a table type involves the deletion and recreation of the table. The issue I'm facing is that I create my tables using Terraform. When I specify the type as "table" in Dataform, it results in Dataform deleting and recreating the table. Consequently, my Terraform state registers this as a deleted table and attempts to recreate it each time I run Terraform.
This is a very reasonable request, but it's complex in its implementation, because we can either:
--addition-bigquery-options="{'writeDisposition': 'WRITE_TRUNCATE'}"
However these options would be best defined at the config per-table level, but:
I'll keep thinking about what the best solution for this would be.
Note: this implementation for this is similar to the request in https://github.com/dataform-co/dataform/issues/1385, as it interacts with the API in the same way.
Hi, @dali-rmw! Can you clarify in which scenarios you'd like to use this setting? If I understand correctly from the docs passing
writeDisposition=WRITE_TRUNCATE
for BQ jobs allows rewriting a destination table for query results. But in Dataform you can create an action and set thetype
field of the query totable
In this case the output table for such action will be rewritten on every run so it is basically an equivalent thing. More details here.
If it's not that you want feel free to describe your scenario :)