fix: Optimize subscript and array/map filter in favor of memory

bikramSingh91 commented 1 week ago

Summary: Currently, these functions wrap the underlying element vectors of the map/array with a dictionary layer where the indices point to the selected/remaining elements. If the element vector is large and only a small subset of elements are selected, then this can create counterproductive dictionaries where the base is much larger than the dictionary. This can then result in large amounts of memory being passed up the execution pipeline, holding onto memory, and furthermore, expression eval can peel these vectors and operate on the large bases that can end up creating intermediate vectors of similar large size. This can put further memory pressure on queries, causing them to hit their memory limits, resulting in failures or spills.

This change ensures that the aforementioned functions that can generate these kinds of results would instead flatten the elements vector if the size of the dictionary is less than 1/8th the size of the base (elements vector).

This helped reduce memory usage in the filterProject operator in a particular query from 2.6GB to 380MB.

Differential Revision: D66253163

netlify[bot] commented 1 week ago

Deploy Preview for meta-velox canceled.

Name	Link
Latest commit	15a05dcd1e25ef405a5a480d789732afe61b5ca9
Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/673f7ecd55937f000894e110

facebook-github-bot commented 1 week ago

This pull request was exported from Phabricator. Differential Revision: D66253163

facebook-github-bot commented 6 days ago

This pull request was exported from Phabricator. Differential Revision: D66253163

bikramSingh91 commented 6 days ago

Linux build failed due to unrelated test. Filed an issue for it: #11619

facebook-github-bot commented 6 days ago

This pull request has been merged in facebookincubator/velox@8d91c1ccec33d1c63bd3334d2ad5a25c20c39c67.

conbench-facebook[bot] commented 6 days ago

Conbench analyzed the 1 benchmark run on commit 8d91c1cc.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebookincubator / velox