Closed JkSelf closed 2 days ago
@xiaoxmeng @tanjialiang @aditi-pandit is this the expected behavior and just means the query ran out of memory? Alternatively, don't we already support spilling for OrderBys?
@pedroerp : OrderyBy does have spilling support. My understanding is that Gluten doesn't use Velox memory arbitration or spilling and has its own custom implementation.
We have been able to run this query on Prestissimo clusters successfully. Though we didn't do 2TB specifically, we did 1, 10 and 100TB.
@pedroerp @aditi-pandit we have supported order by spill and Gluten also does support memory arbitration which connects Velox memory management with Spark memory manager (Gluten has its own arbitrator implement which follows the Velox memory arbitrator interface from @zhztheplayer ). Have offline discussed this issue with @JkSelf and @zhztheplayer, this is because we don't reserve memory when get output from sort buffer. And the memory arbitration is triggered during get output from sort buffer. We need to reserve memory before get output processing as we do for hash aggregation and hash probe. @JkSelf will help on this. @zhztheplayer will help to add native call stack on failure to ease debugging similar issues.
Great, thanks @xiaoxmeng for diagnosing the issue.
@JkSelf I assigned this Issue to you based on the comment above.
Thank you @JkSelf
Bug description
When we run 2 TB TPC-DS, we found Q72 failed with the following OOM exceptions in
getOutput
stage.System information
Velox System Info v0.0.2 Commit: 24e13a142d18e0c78c827052f56a84f09c2ecaa8 CMake Version: 3.28.3 System: Linux-5.4.0-189-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
\nThe results will be copied to your clipboard if xclip is installed.
Relevant logs
No response