bingmann / parallel-string-sorting

Collection of Parallel String Sorting Algorithms including Parallel Super Scalar String Sample Sort and Parallel Multiway LCP-Mergesort
http://panthema.net/2013/parallel-string-sorting/
GNU General Public License v3.0
32 stars 4 forks source link

Build GNU sort compatible "psort" #6

Open ole-tange opened 1 year ago

ole-tange commented 1 year ago

Thanks for updating the code, so it now compiles with modern compilers.

I think this project has huge potential.

I would love to be able to use this as a drop-in replacement for GNU Sort: This would mean all programs that used GNU Sort would be sped up.

Sorting is an extremely important part of many processes, and currently GNU Sort is a bottleneck in some of these. I have used it on my 48 core server, and it scales horribly. See: https://unix.stackexchange.com/questions/579251/how-to-use-parallel-to-speed-up-sort-for-big-files-fitting-in-ram and https://www.gnu.org/software/parallel/parsort.html

To make this project into a usable replacement for GNU Sort I see a few missing pieces:

So a user should be able to do:

cat bigfile | psort > sorted
psort bigfile1 bigfile2 > sorted

As a first version I think it is fine, if not all GNU Sort options are supported. If just the basic string sort works, then I think it will be easier to attract developers to add all the options from GNU Sort.

Had my C++ skills been mediocre I would have changed psstest to do this: It does not seem to be a huge change. But, alas, my C++ skills are way lower than that.

Could you please consider building psort based on psstest so it would work as a simple drop-in replacement for GNU Sort?

bingmann commented 1 year ago

Yes, great idea, but I don't have time for such a project.