The performance of group_by_key sometimes suffers on multi-socket machines, despite scaling fine on a single socket. Its implementation could possibly be improved. group_by_key_sorted does not seem to have this issue, so perhaps it could be used as a fallback to make group_by_key faster.
sample_sort
The example implementation of sample sort in the example code here is sometimes (but not always) faster than the actual sample sort implementation in the library, despite being substantially simpler. We could improve the library implementation by borrowing some of the example code.
group_by_key
The performance of
group_by_key
sometimes suffers on multi-socket machines, despite scaling fine on a single socket. Its implementation could possibly be improved.group_by_key_sorted
does not seem to have this issue, so perhaps it could be used as a fallback to makegroup_by_key
faster.sample_sort
The example implementation of sample sort in the example code here is sometimes (but not always) faster than the actual sample sort implementation in the library, despite being substantially simpler. We could improve the library implementation by borrowing some of the example code.