forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
559 stars 227 forks source link

SpillableGroupBy - Dynamic SpillFile creation #699

Closed kutschm closed 10 years ago

kutschm commented 10 years ago

SpillableGroupBy pre-allocates a configurable number of SpillFiles of size Integer.MAX_VALUE. Spilled elements are hash distributed among these SpillFiles. Under the covers these files employ an extendible hashing technique to dynamically grow the addressable spill space. For this, each SpillFile keeps a directory of pointers to Integer.MAX_VALUE 4K pages in memory, which allows each SpillFile to typically address more pages than a single SpillFile could theoretically store.

This pull request, dynamically allocates additional temp files to the SpillFile, in case directory doubling causes a page index request that exceeds the limits of the current SpillFile.

jtaylor-sfdc commented 10 years ago

Nice! Thanks, @kutschm !