Montreal-Analytics / dbt-snowflake-utils

Snowflake-specific utility macros for dbt projects.
Apache License 2.0
107 stars 37 forks source link

Tagging is too distinct from model building #38

Open BeadW opened 1 year ago

BeadW commented 1 year ago

The current implementation of snowflake tagging runs as a post run hook. (https://github.com/Montreal-Analytics/dbt-snowflake-utils#snowflake_utilsapply_meta_as_tags-source)

This works however there are some ways we can improve it.

Limitations (not exhaustive) of the current approach:

  1. Unable to fail fast - A typical failure mode is that the yaml doesn't accurately match the model output causing tagging to throw a snowflake error. This error isn't known until all models have run causing a delay and leaving models in an undefined state.
  2. Failures aren't able to be rolled back - Because the tagging isn't part of the same transaction we can't roll back when a failure occurs.
  3. Process is slow - A post run hook is single threaded, given we are applying tags to models which are mutually exclusive by their nature a multi threaded approach would be more performant.

I propose that we move to a post hook which runs as part of the model transaction. This means that tagging should occur as part of the model build transaction and dbt will automatically multi thread it according to the threading settings of the project.

I have a proof of concept to demonstrate this works and am happy to provide that and work toward a more feature complete version if this approach is to be adopted.

jamesweakley commented 1 year ago

Hi @BeadW , There is definitely already some interest in shifting it to the model level. If you have an alternate macro we could add into the mix, it would definitely be welcome. We would probably keep it as two different macros so that people can choose which method to use.

kanomaxb commented 1 year ago

I would add another reason:

  1. For tag-based masking policy the time between creating a table and masking its data needs to be minimal.