Open RonniePitts opened 1 year ago
Probably some benefit instead of have unnecessary columns for LDTS and RECORD SOURCE in Staging layer.
Hi,
Welcome and thanks for your suggestion.
Whilst the idea makes sense on paper, I can't think of a use case where we wouldn't want the RECORD_SOURCE
or LDTS
in the staging layer.
There are a few reasons for the current implementation at this time:
If we have it in the raw vault, hard coded then we have no traceability and no audit.
RECORD_SOURCE
and LDTS
are often shared by multiple tables in the raw vault (units of work) and rather than repeating yourself everywhere, referencing the column is much cleaner, especially since that specific stage object will likely be loading multiple raw vault objects
You may be loading your hubs and links from multiple staging objects, in which case hard coding the LDTS
or RECORD_SOURCE
is unwanted behavior, as it's likely those staging objects have multiple sources between them (each record could be from any of the sources)
For the above reasons I'm not sure having these columns in the stage is unnecessary. It is in fact quite standard.
Saying all this, we're not against adding this if it's seen to be useful by the wider community. Could you help us understand where this might be used?
Thanks!
Hi Alex,
Actually, the value will be sourced from a macro and reference the dbt node name so the value will change depending on the model its referencing so technically it isn’t hardcoded as such.
We use a control table for our LDTS so depending on the model it will retrieve the control date from that source.
Understand typically its in the staging table but since its possible to support macros we can be a bit creative how we now populate this value downstream.
Thanks
Yeah this makes sense. Sometimes record sources are actually codes and it is good practice to have a lookup table which maps these to system names.
I presume this is what you mean? Sounds like you have a macro to do this kind of lookup
Correct, I already use a macro for the Data Source aka Folder name retrieved from the node in STG.
It makes it more useable from a coding point than having columns for the sake of it.
Currently the Staging method supports string values, e.g.
derived_columns: SOURCE_SYSTEM: '!{{ somevalue }}' INSTANCE: '!someinstance'
It would be nice to support the string value in Hubs/links/sats methods as well instead of just a column value.
Describe the solution you'd like Support the use of string values, e.g. ! for LDTS and RECORD SOURCE
AB#5363