ScalefreeCOM / datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.
https://www.scalefree.com/
Apache License 2.0
141 stars 27 forks source link

Clustering on hashkey #254

Open sohai-max opened 2 months ago

sohai-max commented 2 months ago

Hi, is there a plan to optionally include a hashkey as a clustering key during the creation of transient tables? This could enhance query performance when joining large tables, although using business keys instead of hashkeys for joining can address this issue as well.

tkirschke commented 2 months ago

Hi @sohai-max and thanks for reaching out!

Without any context, I just want to let you know, that we always recommend materializing all Raw Vault entities as "incremental" to ensure proper historization.

But coming back to your question, a clustering key for transient tables isn't really something we can implement with our package, instead it needs to be defined in the model config.

Check out the dbt documentation about the parameter "cluster_by" here.

Let me know if this helps!

Best regards Tim