The parlay::kth_element() function throws a segmentation fault when inputs are highly duplicated. I.e.:
parlay::sequence<int> a( 2000, 1 );
int k = 1000;
auto less = []( int p, int q ) { return p < q; };
// auto less_or_equal = []( int p, int q ) { return p <= q; };
std::cout << *( parlay::kth_smallest( a, k, less ) ) << std::endl;
Using comparison less_or_equal will throw another segmentation fault.
This is because that when inputs are highly duplicated, the sampling based bucket method will sieve all elements into one bucket and continue recursion without triggering the granularity control.
A naive solution is to special judge the case that all elements are distributing into one bucket, i.e.:
template<typename Range, typename Compare = std::less<>>
auto kth_smallest_copy(Range&& in, size_t k, Compare&& less = {}) {
// no change
// Determine which of the 32 buckets each key belongs in
internal::heap_tree ss(pivots);
auto ids = parlay::tabulate(n, [&](long i) -> unsigned char { return ss.rank(it[i], less); });
// Count how many in keys are each bucket
auto sums = parlay::histogram_by_index(ids, sample_size + 1);
//! special judge
if (std::count(sums.begin(), sums.end(), 0) == sample_size) {
return parlay::sort(std::forward<Range>(in), std::forward<Compare>(less))[k];
}
// no change
}
Hi developers,
The
parlay::kth_element()
function throws a segmentation fault when inputs are highly duplicated. I.e.:Using comparison
less_or_equal
will throw another segmentation fault.This is because that when inputs are highly duplicated, the sampling based bucket method will sieve all elements into one bucket and continue recursion without triggering the granularity control.
A naive solution is to special judge the case that all elements are distributing into one bucket, i.e.: