dbt-labs / dbt-adapters

Apache License 2.0
25 stars 35 forks source link

Improve memory efficiency of process_results by iterating. #217

Closed peterallenwebb closed 4 months ago

peterallenwebb commented 4 months ago

resolves #218

Problem

The process of returning query results from execute() is memory inefficient, as multiple intermediate copies of the result data are maintained simultaneously.

In the case of docs generate, we are sometimes querying for information about every column in a schema. This can mean that a million or more records are returned in more extreme cases, resulting in gigabytes of memory allocation. In this scenario, maintaining multiple copies of the results, even temporarily, is untenable.

Solution

Yield data rows one by one from process_results() rather than returning every row as a list, to eliminate one full copy of the result table. We could still do more work in this direction, but I documented a 33% reduction in memory associated with the get_catalog query with this approach.

Checklist

github-actions[bot] commented 4 months ago

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.