Open jtcohen6 opened 1 year ago
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
If possible, can we separate this handling from parsing? What I am thinking is that parsing is going from everything in project file -> a representation, then in actual runtime, we apply configuration to the representation and do the execution.
@ChenyuLInx Supportive of this line of thinking! The biggest caveats here is that vars
can be used to dynamically disable/enable models, or to conditionally affect relationships between models — so it is necessary to resolve some vars
during parsing in order to know the shape of the DAG, and to support node selection.
During parsing, we could store pointers to those variables, and then conditionally reevaluate them just before each execution. That feels similar to the approach described in this issue (partial parsing), though with some subtle differences in implementation.
Yeah, there's a difference between vars that are needed at parse time and vars can be resolved at compilation/execution time. Maybe we need some use cases to help think through the different situations. Vars in configs have to be resolved at parse time. Vars in plain sql could be delayed. I'm not sure how we could distinguish between them.
@gshank Do you know if the partial parsing manifest (target/partial_parse.msgpack
) contains enough information (raw file contents & unrendered yaml configurations/attributions), such that we could support a re-parse when CLI --vars
are supplied, without needing to go back to the actual file system?
I'm thinking:
--vars
override (the original scope of this issue)This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
We use the cli --vars to pass in airflow datetime variables. These change on each run, so we can't partial parse. Is there a better way of handling datetime variables? Can we have an ignorelist of some variable names (so that they don't trigger the partial parse) (or similiar to the secret env var ignore rule, some var prefix like VAR_NO_PARSE_my_datetime)
Just like https://github.com/dbt-labs/dbt-core/issues/3885, but for CLI
--vars
.This would require us to capture, at parse time, which files depend on which
--vars
, via calls to the Jinja{{ var() }}
function. That would also include macros that callvar()
, and are then called by models / other macros in turn.For Python models, if we introduce a built-in
dbt.var()
function, we'd want to do the same. We're already doing something similar for configs, to powerconfig.get()
at runtime.Whenever the
--vars
change, instead of triggering a full re-parse, we'd schedule just the files that depend on thevar
for re-parsing. Of course, if thevar
is used for a configuration withindbt_project.yml
, that could still affect many many nodes.