Closed Morwenn closed 7 years ago
The merging scheme used by NeatSort and described in NeatSort - A practical adaptive algorithm by Marcello La Rocca and Domenico Cantone might be interesting: save the beginning of each run, then merge them two by two, preferring to runs two smaller runs out of three when possible, with a stable ratio.
The ping-pong merge algorithm used by the improved patience sorting algorithm described in Patience is a Virtue: Revisiting Merge and Sort on Modern Processors by Badrish Chandramouli and Jonathan Goldstein could also be an interesting candidate: sort the runs by size then merge them pairwise in a first array, then in a second one, then back in the first, etc...
I replaced the « merge-as-you-go » merging strategy by a simple k-way merge algorithm in random-access vergesort: now the vergesort remembers the runs, then merges them pairwise at the very end of the algorithm until there is only one big run left. It improved the performance of the ascending sawtooth and descending sawtooth pattens by ~20% (only tested with integers though).
I may try to implement the merge strategies described above at some point, but the current one is probably enough for most practical uses, so I'm closing this issue.
To ensure that less memory is allocated during the merge operations, we could perform a binary search of the greatest element of the first partition into the second, then a binary search of the smallest element of the second partition into the first. This would yield two merging points defining smaller partitions to merge; and binary search should be the right tool since vergesort ensures that the partitions to merge are big enough.
To reduce the cost of binary search, the first binary would be performed in the smaller partition first, then in the greater partition, replacing its original bound by the newly found merging point. Seeing how the current inplace_merge
implementation works, this would surely reduce the amount of allocated memory in some scenarios.
Right now, the merging scheme used by vergesort is pretty dumb: sorted subsequences are merged on the fly when they are found or generated. The only subtlety occurs when the algorithm has three successive subsequences to merge, in which case it merges the middle subsequence with the smallest of the two remaining subsequences first then merges the result with the remaining subsequence.
Instead, the algorithm could sort and reverse subsequences on the fly and only merge everything in the end. To do so, it could store iterators that denote the beginning of the different sorted subsequences in a list (at most log n iterators, so the additional memory footprint shouldn't be too high), then compute the length of the subsequences and merge them all, starting with the smaller ones. This merging algorithm sounds like a O(n²) algorithm; I hope the cost will be dwarfed by the actual cost of merging.