Take care of all boilerplate SQL (dynamic filtering, union logic)
Write model SQL only once
lambda_filter prints a static timestamp at runtime into both the historical and new versions of the data, rather than relying on a dynamic query result or a current_timestamp relative to query time. This static timestamp serves as a stable bookmark that can only be pushed forward by the next dbt run, saving ourselves from potential data gaps in the event of job failures. (@clrcrl explained this well here.)
The default value is run_started_at. @amychen1776 suggested that an optional var makes sense for CLI override; I've named it (badly) as lambda_split.
Cons
Requires three separate files: historical model file, lambda model file, macro file to encode model SQL
Model SQL is stored in macros/ instead of models/. Model files just call macros.
We didn't like this for packages. Are we willing to go along with it for this use case? As long as we establish really good conventions for organizing the macros/ directory, I'm personally leaning in this direction.
The alternative is copying the model SQL into both the historical and the lambda models (as in #5). This saves a file and makes the code more immediately visible, but at the cost of duplicating logic and requiring analysts to update / cross-check it in multiple places.
Outputs
Alt-alt approach: custom materialization??
I've included a mockup in models/thought_experiment_only. As the name suggests, this is only a thought experiment.
Pros:
We can both keep model SQL within the model file and write that SQL only once
Cons:
It obfuscates a lot of logic into the materialization layer
Challenges:
How to pass config values down to the historical model? Namely materialization (table or incremental), schema, alias
How to "call" one materialization from another? We don't want to copy+paste all the logic from every adapter's incremental materialization into a new lambda_view materialization
Alternative to #5
Macro-based approach
Pros
lambda_filter
prints a static timestamp at runtime into both the historical and new versions of the data, rather than relying on a dynamic query result or acurrent_timestamp
relative to query time. This static timestamp serves as a stable bookmark that can only be pushed forward by the nextdbt run
, saving ourselves from potential data gaps in the event of job failures. (@clrcrl explained this well here.)The default value is
run_started_at
. @amychen1776 suggested that an optionalvar
makes sense for CLI override; I've named it (badly) aslambda_split
.Cons
macros/
instead ofmodels/
. Model files just call macros.We didn't like this for packages. Are we willing to go along with it for this use case? As long as we establish really good conventions for organizing the
macros/
directory, I'm personally leaning in this direction.The alternative is copying the model SQL into both the historical and the lambda models (as in #5). This saves a file and makes the code more immediately visible, but at the cost of duplicating logic and requiring analysts to update / cross-check it in multiple places.
Outputs
Alt-alt approach: custom materialization??
I've included a mockup in
models/thought_experiment_only
. As the name suggests, this is only a thought experiment.Pros:
Cons:
Challenges:
config
values down to the historical model? Namelymaterialization
(table or incremental),schema
,alias
incremental
materialization into a newlambda_view
materialization