Closed MaxHalford closed 8 months ago
There's some good stuff here too. The way I see it, there are two options:
CREATE DATASET
in a massive transaction. The transaction is only committed if the whole DAG succeeds. This is a clean solution. The only thing is that I'm doubtful of the ability to do transactions with many massive queries. To be explored.__staging
suffix. Then do a transaction to replace the current production tables with the staging tables, and drop the staging tables. The issue with this is that there will be __staging
tables left behind if the DAG fails at some point. But maybe this is a good thing for inspecting why a particular view failed.My initial gut feeling was to create all tables in a dedicated dataset, and then switch datasets. You can do that with Snowflake and DuckDB, but not with BigQuery. I think it's important to find a strategy that works with every database we think we'll support.
Done! See the --wap
flag
Also called blue-green deployment in the software engineering world.
When a refresh happens, the views are modified in-place. During this refresh, there are thus views that are not in sync. This could cause race conditions and tricky situations. For instance, you might refresh 10 views. And then there's an error raised for the 11th view. But you still have 30 views to go. So you now out-of-sync views, which is not a good situation.
One solution to this could be to create a temporary schema, create the views there, and then switch the target schema with the temporary schema. This must be known way of doing stuff in the database world, so a little bit of research could prove worthwhile.