Open peterallenwebb opened 2 weeks ago
As @peterallenwebb noted, a source of complexity here is that this add_test_edges
currently accounts for tests that depend on multiple models, not just one. It may be difficult to take similar approaches for running test nodes "just in time" after a model completes during handle_job_queue
if certain tests depend on multiple models before they can run
One thought here is to remove the transitive edgestest1
-> model 3
(add_test_edges).
@gshank mentioned we can also only do this operation for selected parts of the DAG or not build it when people select tests in build command.
Housekeeping
Short description
The add_test_edges() function is called during the
dbt build
command, and inserts edges into the execution graph which are meant to ensure that models downstream from a node will not run until all the tests on that node have passed.The function is slow in certain projects, and recent data from the field show that it inflates the number of edges in the graph by a factor of six. It is slow enough that it often shows up in performance profiles, but is even more problematic in terms of memory consumption, as memory use is high enough to cause OOM crashes.
Acceptance criteria
Suggested Tests
Existing tests should suffice, but we should add additional tests to reduce the risks associated with the new implementation.
Impact to Other Teams
None.
Will backports be required?
No.
Context
No response