cmuparlay / parlaylib

A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
MIT License
321 stars 60 forks source link

Possible improvements to grouping and sorting performance #56

Open DanielLiamAnderson opened 1 year ago

DanielLiamAnderson commented 1 year ago

group_by_key

The performance of group_by_key sometimes suffers on multi-socket machines, despite scaling fine on a single socket. Its implementation could possibly be improved. group_by_key_sorted does not seem to have this issue, so perhaps it could be used as a fallback to make group_by_key faster.

sample_sort

The example implementation of sample sort in the example code here is sometimes (but not always) faster than the actual sample sort implementation in the library, despite being substantially simpler. We could improve the library implementation by borrowing some of the example code.