dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
849 stars 160 forks source link

Delete unreferenced tables in a schema #546

Open lewish opened 4 years ago

lewish commented 4 years ago

There should be a way to tell Dataform to remove any tables from schemas that no longer exist.

This is complicated by the fact that a run may not always contain all of the created tables/views:

As a solution to this, we can compute the tables to drop from the compiled graph. We can also for now make the assumption that any database + schema written to by a dataform action is considered "dataform managed".

In the future, we can allow the project to specify which database schemas are considered managed explicitly.

Proposal

Add a new action dataform prune that given the same args as compile, creates a list of all tables and views in any database schema that is written to by the current compiled graph that will not be created by the current compiled graph.

When called with dataform prune --drop, the command should remove all of the listed tables.

hobailey commented 1 year ago

@lewish did you ever find a temporary workaround for this by any chance? (eg a specific script you wrote to do the same job until it's available natively)

hrialan commented 4 months ago

Hello @hobailey Here is a script: https://github.com/hrialan/dataform-prune