SpillableGroupBy pre-allocates a configurable number of SpillFiles of size Integer.MAX_VALUE. Spilled elements are hash distributed among these SpillFiles. Under the covers these files employ an extendible hashing technique to dynamically grow the addressable spill space. For this, each SpillFile keeps a directory of pointers to Integer.MAX_VALUE 4K pages in memory, which allows each SpillFile to typically address more pages than a single SpillFile could theoretically store.
This pull request, dynamically allocates additional temp files to the SpillFile, in case directory doubling causes a page index request that exceeds the limits of the current SpillFile.
SpillableGroupBy pre-allocates a configurable number of SpillFiles of size Integer.MAX_VALUE. Spilled elements are hash distributed among these SpillFiles. Under the covers these files employ an extendible hashing technique to dynamically grow the addressable spill space. For this, each SpillFile keeps a directory of pointers to Integer.MAX_VALUE 4K pages in memory, which allows each SpillFile to typically address more pages than a single SpillFile could theoretically store.
This pull request, dynamically allocates additional temp files to the SpillFile, in case directory doubling causes a page index request that exceeds the limits of the current SpillFile.