Open peterallenwebb opened 11 months ago
Is there any progress on this issue? Our dbt docs are about 1M and full project parse (dbt parse --no-partial-parse
) takes about 2-3 minutes on M1 Mac.
@aranke here is the issue mentioned during the dbt meetup today on slow documentation parsing. There is also a closed PR that proposed a fix to this.
Hope you will be able to prioritize this 🙏🤩
Here is a flame graph of doing a full parse of our dbt project (~2300 models). Our documentation markdown file is just shy of 1MB.
As you can see, extract_toplevel_blocks()
takes about 75% of the time of dbt parse
:
If we empty out our Markdown docs file and remove all doc
references from our config files, the dbt parse
runs about 4x faster.
Have replicated the changes in https://github.com/dbt-labs/dbt-core/pull/9045 in a new PR for dbt-common
: https://github.com/dbt-labs/dbt-common/pull/189
This change reduces dbt parse
for our dbt project from 2m20s to 41s on my M1 Mac.
We've received a complaint that dbt-core's parsing performance is surprisingly slow for large docs files. On an M1 Mac, files of around 500K can take over a minute to parse, and appears to increase super-linearly. The critically slow step is the call of extract_toplevel_blocks() on the file contents. The extraction of top-level jinja blocks is could likely be made much faster, but this is extremely critical code and we need to preserve existing behavior.
This does not appear to be a regression, but current performance is embarrassingly bad.
To generate a file which reproduces the performance problem, repeat the following snippet a few thousand times in a text file with the .md (markdown) extension, and add it to a dbt project, or call extract_toplevel_blocks() on it directly.
Impact on other teams
None
Needs backport?
Unsure