dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
806 stars 151 forks source link

'Incremental' tables should support schema changes #373

Open BenBirt opened 4 years ago

BenBirt commented 4 years ago

(disclaimer: I haven't tested how the following works in all cases or against all warehouses, it may be that we already support what follows in some ways / or not, I'm not sure yet.)

Right now it's unclear what happens to an incremental table if the user changes its schema (i.e. adds a new column or deletes an old one, or perhaps changes the type of an existing one). We should check the behaviour of these cases against our currently-supported warehouses.

It would be great if we could support automatically running e.g. ALTER TABLE statements to get these cases to work. The tricky thing here will be figuring out precisely what table alterations need to happen, whether we need to diff old vs. new schemas somehow, or whether we can just update the schema of the table to a new state idempotently without caring what the old state of the table might be.

If it turns out to be impossible / really hard to support this kind of behaviour, we should at least check that dataform does something that the user might expect, e.g. throw an error depending on the case.

deniszaboronsky commented 2 years ago

@BenBirt I came on to raise this issue. In BigQuery (Dataform version 1.22.0) we are seeing that new columns require us to re-create the base table as otherwise what Dataform compiles to is:

INSERT INTO target_table (Col1, Col2, Col3)
AS (
SELECT
    Col1, Col2, Col3, Col4
FROM some_table)

Where the query used to be

SELECT
    Col1, Col2, Col3
FROM some_table

and is now

SELECT
    Col1, Col2, Col3, Col4
FROM some_table

This of course means that we lose the new column that we had hoped to add. Just though it was worth flagging that we would love an alter table statement. Let me know if any extra info would be useful

SourabhKr commented 5 months ago

@Ekrekr is there any update on this request? The requirement here is very basic and usually there are workarounds running in all implementations.

Ekrekr commented 3 months ago

@Ekrekr is there any update on this request?

Still not yet.

The requirement here is very basic

Not really, there's a lot of usability implications that would need to be tested.

Contributions are welcome!

chateletlealSephora commented 1 month ago

Hello,

Is there any updates regarding this feature.

dbt manages it through configuration, why not dataform ? Event if it doesn't work for records or repeated records, it is really useful for a part of the schema changes use-cases.

https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change

image