Closed balmasi closed 3 years ago
Great ideas. Both of these features are quite and straight forward to implement. Derived columns bring accessible by hashed columns is something we already have in the backlog, and the derived column concatenation syntax is certainly very useful!
Added in v0.7.1 Close if working as expected
Closing as we have implemented the primary suggested feature. The concatenation of columns in the derived_columns
configuration has been separated into a new issue (#20) and will be added to the next release (surprise: it's already been developed!)
I've come across a few instances where I need to hash a derived column in my staging area. Sometimes this is because I simply want to append a manual string to my BK, other times I want to rename something in my staging for use down the line, but I don't have access to it in the
hashed_columns
section so I have to write SQL or use a field name I'm renaming (or be forced to create an extra model to feed the stage macro)for example, I have to do the following:
What I'd like to do:
Additionally, I'm finding an enterprise-wide unique natural key is hilariously difficult. For example imagine I have an
HR
system,Contractor
system, and apartner company
system and I want to represent a single concept of Employee.As you can see the unique keys here are different in each case, so my Business Key has to be a concatenation of multiple keys in the source system. This is what is referred to as
src_nk
in your hub macro.In the above example, if I had a 2-part key, passing both keys as
src_nk
to the hub macro would result in 2 columnsSCOPE_BK
andDOMAIN_BK
, however, I'm only after a single concatenated column.It would be amazing if dbtvault could support this concatenation as a first-class concept (using the same transformation built into the hash function)
In this case, I might simply have, in each of the staging configs that feed into the hub, something like:
This would generate a table that looks like this:
Currently, I don't even have access to the hash concat macro by itself cause it's a part of the hash macro.
Anyway, I know this is kinda multiple issues, but I wanted to put it out there to see if others felt similar pains.
Of course, as with everything else dbt, you can work around this by adding a separate model before your dbtvault one, but I felt I was doing this all the time and something that can be easily factored out.