Closed FelixYBW closed 3 months ago
Does PR6480 can fix some spill issues? Or just error changed?
seems same as https://github.com/apache/incubator-gluten/issues/4275
It failed here
final UnsafeSorterSpillWriter spillWriter =
new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics,
inMemSorter.numRecords());
spillWriters.add(spillWriter);
spillIterator(inMemSorter.getSortedIterator(), spillWriter);
numRecords is inMemSorter.numRecords() (pos / 2), numRecordToWrite is inMemSorter.getSortedIterator().length(), it is in function UnsafeSorterIterator.getSortedIterator()
with null
queue.add(new SortedIterator(nullBoundaryPos / 2, 0));
queue.add(new SortedIterator((pos - nullBoundaryPos) / 2, offset));
UnsafeExternalSorter.ChainedIterator(queue)
without null
return new SortedIterator(pos / 2, offset)
nullBoundaryPos
and pos
are even.
So it should always spill all the records in inMemSorter, I don't known why the error throws. Does the customer change the Spark logic?
Does PR6480 can fix some spill issues? Or just error changed?
PR6480 does solve the parquet write spill issues. With this query, the external sort isn't called by parquet writer, but looks by the ObjectHashAggregate. With or without PR6480, the error message is different.
If not using pr6480, is it possible we can set spark.shuffle.spill.numElementsForceSpillThreshold
to force spill to bypass this issue?
Yes, this is a common solution, for specific query, you can change this config to a reasonable value to bypass this issue. @boneanxs
But why pure spark won't have such memory issue? Since 2 different ExternalSorter
can't spill each other before this pr in spark, it should be the same with the gluten sorter that can't call another spark sorter to spill.
Sorry, I must miss something here, can you please elaborate more? @FelixYBW @jinchengchenghh
But why pure spark won't have such memory issue? Since 2 different
ExternalSorter
can't spill each other before this pr in spark, it should be the same with the gluten sorter that can't call another spark sorter to spill.Sorry, I must miss something here, can you please elaborate more? @FelixYBW @jinchengchenghh
Backend
VL (Velox)
Bug description
the query plan:
looks like there is an external sort in operator ObjectHashAggregate.
without PR6480, it reports OOM error:
with PR6480, the error becomes:
Looks PR6480 also triggered the sort in the external sort, but it has error.
@jinchengchenghh @zhztheplayer
Spark version
Spark-3.2.x
Spark configurations
No response
System information
Velox System Info v0.0.2 Commit: 34dbec25d204fcb302893429350d37081feb5edf CMake Version: 3.29.4 System: Linux-5.4.0-1063-aws Arch: x86_64 CPU Name: Model name: Intel(R) Xeon(R) Platinum 8488C C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response