cmuparlay / parlaylib

A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
MIT License
321 stars 60 forks source link

Performance regression from #22 #40

Closed DanielLiamAnderson closed 1 year ago

DanielLiamAnderson commented 1 year ago

PR #23 fixed a potential deadlock in the allocator by making initialize_lists conservative (prevents the scheduler from stealing while waiting at the join point).

https://github.com/cmuparlay/parlaylib/blob/f53b6e30fbf0a7466017e2fadb28a2f9794369e9/include/parlay/internal/block_allocator.h#L88

For some reason, this causes the performance of remove_duplicates to worsen by up to 30% in the benchmarks. Most other functions are unaffected or only a few percent different.

Reintroducing the deadlock is not desirable, but this performance regression should be investigated.

I miscalculated, this regression occurred in PR #22. The hash function for integers, used in many of the grouping and deduping algorithms became more expensive.

DanielLiamAnderson commented 1 year ago

Fixed by #55.