use in-place parallel super scalar samplesort

https://github.com/SaschaWitt/ips4o

If this could be implemented on a mem-mapped buffer, then we might be able to use a fully parallel sort in the dmultimap.

It's worth noting that the bounds on the bsort disk backed radix sort aren't great, given that we're sorting a 128-bit integer in 128/8 passes (~log(n)).

Only some coding and testing will demonstrate if this is a good way to go. The sort is currently the main bottleneck in seqwish.

ekg / seqwish

use in-place parallel super scalar samplesort #14