Closed cajubelt closed 2 months ago
Thanks @cajubelt! Is this only for the build
command (i.e. not run
)?
I suspect this might be due to the large number of tests in your project (per our conversation), and the additional time that dbt spends "linking" the DAG (adding edges between test on upstream model -> downstream models, so that they skip on test failure). While it's not immediately clear to me what change we made between v1.5 -> v1.8 to that logic, if you only see this slowdown on build
, it would be a strong indication that therein lies the problem.
@jtcohen6 yes I just confirmed that dbt run doesn't have any noticeable slowdown on 1.8 with the same selector.
@cajubelt Okay! So I think the hypothesis is:
dbt build
in v1.8 compared to v1.5add_test_edges
methodIf you're up for it, there are two ways to try confirming that hypothesis by profiling dbt's performance:
py-spy
(snazzier) in the way @peterallenwebb recommends here:
pip install py-spy
sudo py-spy record -s -f speedscope -- dbt build
We have the ability to exclude tests from the build resource types, but that doesn't affect whether or not we pass "add_test_edges" to compile. It would be possible to check the resource types and if tests are excluded, pass "add_test_edges" = False.
Of course if you do want tests to run, that doesn't help.
We do want to run tests if possible.
@jtcohen6 here's a screenshot of a search through py-spy's output for the longest segments of the flame graph. Looks like generic_bfs_edges
is adding 12:41, not sure if that's particularly revealing however
@jtcohen6 Not super sure if related to this issue but we're seeing an extreme slowdown when build
ing a large chunk of a fairly big graph with --indirect-selection buildable
in CI/CD. The project has 2573 models, 13 snapshots, 7 analyses, 2254 data tests, 20 seeds, 1 operation, 359 sources, 44 exposures, 1408 macros, 1 group, 13 unit tests
cautious
and eager
end up being fine, but with buildable
, the process "hangs" for up to 20 minutes after it prints the project stats "Found 2573 models ..." etc. and before the "Concurrency: 4 threads (target='test')" message. No other output, even on debug
log level.
I tried to figure out what's going on using the trace
module but it was a bit too much spam :). Will try add some trace output when I find a minute
here's me waiting for about 5minutes for the generic_bfs_edges
traversal
In a "good" run with eager
selection this takes about two seconds with the exact same selector (the rest is dbt actually starting to execute, which I didn't wait for in the other run)
What percentage of your graph (including tests) is selected by the tag, and what % is selected by the selector?
I had someone on my team try using the @
decorator on a selector (which I don't normally do) and we both had DBT hang. I kind of pushed it out of my head, will be thinking on this now.
I don't really understand the py-spy
output (worked with SnakeViz).... if I had to guess what might be happening:
generic_bfs_edges
is either really slow or being called a lotcollect_specified_neighbors
in selector.py
calls self.graph.select_children
when passed certain selector patternsself.graph.select_children
applies the self.descendants
method for each node, which is perhaps a lot of times, or maybe it uses up a lot of memory.filter_edges_by_type
, which is calling get_edge_data
in networkx
,If you can share the snakeviz would help out, also maybe check your memory usage? Last bug I worked on for this graph optimization stuff was having performance issues from running out of memory when the DAG got too big.
If my stab in the dark here is in the right direction, maybe there is a pruning that could be done to reduce calling descendants
so much. Like, if you have a ton of nodes in your selector for which many are already descendents of each other, you don't need to re-tread and evaluate each node. could first get a list of all descendents for each node with a depth of 1
, then take the difference of that and the list of nodes in the selector to restrict to searching for descendents of unsearched nodes, find descendents of depth 1 from that list, and iterate.
For your graph has 20k nodes, if 1k have the tag, 10k are descendents, and each descendent is on average the descendent of 70% of the tagged nodes, this takes the number of times each of the descendents has to be evaluated by ~700. 700*10,000 is a lot of calls.
After writing this out this seems actually pretty likely XD. Think an update to select_children
with this logic would be the place to implement this. This seems similar to https://github.com/dbt-labs/dbt-core/pull/7194
Also if the slow part is filtered_graph = self.exclude_edge_type("parent_test")
, maybe could cache that or pass it as a parameter into descendants
?
will try tests on my own project with ~500 models and ~5000 tests next week.
I added a PR! https://github.com/dbt-labs/dbt-core/pull/10526
I believe I was able to recreate the issue in my project, which has: Found 442 models, 43 snapshots, 5112 data tests, 32 seeds, 2 operations, 134 sources, 23 exposures, 1075 macros
Running dbt build --select tag:silver+
on my DAG on 1.8.4, it takes 132 seconds to start building. dbt build
just takes 11 seconds.
My PR applies the pruning strategy I described. I saw the runtime improve for dbt build --select tag:silver+
to 66 seconds. Using SnakeViz, I identified that the number of calls to get_edge_data
reduced from 14,332,569
to 383,916
.
I am unsure how much DBT 1.9/main branch contains performance improvements... the total runtime improvement was 66 seconds, but select_children
only improved from 80 seconds to 45 seconds (35 seconds improvement). Going to try to run main
branch for performance check.
Feedback welcome, especially if you can test/review the change!
I've reviewed and suggested some changes to @ttusing's PR but we should be close to a fix on this.
This is now resolved in 1.9+, but I'm re-opening this issue so we can consider backports.
I'm also interested in seeing performance reports from @stefankeidel and @cajubelt. This should help, but checking the edge type seems to be inherently slow. Wonder if the runtime will improve from 20 minutes to like 15, 10, or 5 minutes and if there are further improvements that can be made in the edge type lookup speed.
I could image avoiding the edge type lookup altogether somehow with the knowledge that tests cannot depend on tests. So, don't check the edge type for searching children if the node is a test (or if parent is a test for ancestors) or something like that. Or, building the subgraph with all non-tests nodes first.
Just saw the great activity on this after it fell off my radar for a bit, thanks all. Will follow up on Monday with a perf report, I have personal commitments this weekend.
We'd love to see a backport to 1.8.
Thanks for the update and amazing work! I am currently traveling on a sabbatical leave so I won’t be able to test this before I’m back in November, but will make sure to do so then!On 9. Aug 2024, at 23:55, Charlie Andrews-Jubelt @.***> wrote: Just saw the great activity on this after it fell off my radar for a bit, thanks all. Will follow up on Monday with a perf report, I have personal commitments this weekend. We'd love to see a backport to 1.8 since our project is fairly large and we are unlikely to update to 1.9 before some key deadlines this Fall.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
So I'm looking at this section:
This adds a huge amount of edges. For probably most but pathological DAGs, this is an edge between every test and all nodes downstream of what its testing. For the example of 5k models and 15k data tests, this might be added approximately (5k*15k/2)= 37.5 million edges.
Was added in https://github.com/dbt-labs/dbt-core/pull/7276, I don't really understand the purpose of this code. @gshank can you explain why it is necessary to add all of these edges? It seems like a huge factor limiting DBT scaling. If the goal is to get dbt build
to run tests on parents before children, isn't enough to just do this with a depth of 1?
I'm guessing the get edge type function is slow because of the millions of edges. There are way less nodes (thousands), so getting node type should probably be way faster, and I am nearly certain nothing can directly depend on tests (only parent_edge
type), so I think checking the node types is another potential huge optimization.
Somewhat separately, I wonder about a feature enhancement to print simple DAG stats with like --dag-debug
for helping with these. Might show the total number of edges, highest degree node (or distribution stats on degree), highest number of tests on a single model (or distribution), etc.
Ok I did some profiling last night on dbt build --select tag:large_tag+
. Looks like get_edge_data accounted for about 36 minutes and was called 710,951,628 times. Total time to build was 81 minutes. (I used --empty and it errored out immediately when writing to tables so none of this was time waiting for the db writes.) I used dbt 1.8.2. Will try again tonight on 1.9 if that's helpful
Ok, ran it again with 1.9.0-a1
and the improvement was dramatic. The build time was just 10 minutes. get_edge_data
was called 11,778,492, a 98% reduction, and accounted for just 21 seconds.
Some stats about my project: 5799 models, 20 snapshots, 91 analyses, 18763 data tests, 186 seeds, 1 operation, 1086 sources, 69 exposures, 1899 macros
Resolved for dbt-core 1.9+ in https://github.com/dbt-labs/dbt-core/pull/10526 and backported to dbt-core 1.8 in https://github.com/dbt-labs/dbt-core/pull/10567.
Is this a regression in a recent version of dbt-core?
Current Behavior
dbt build -s tag:my_tag+
takes about 20 minutes longer to start on dbt 1.8.2 than it does on 1.5.9 with the same tag. The tag used has a lot of downstream nodes in our project, about 11k. Generally we’re seeing better performance on 1.8 so we were surprised to see this big regression in performance.Expected/Previous Behavior
Previously building everything downstream of a tag with lots of nodes would take a couple minutes of startup time and then begin running queries against our db. Now it takes 20+ minutes.
Steps To Reproduce
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
The reason we need this is because we have a selector used in CI/CD that excludes everything downstream of a tag that is upstream of many nodes. The example given is a simpler version of the original issue we found with that selector. (We tested the simpler version and found to also have the same performance issue.) The selector was something like