ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.06k stars 437 forks source link

nth_element() appears to be broken #1723

Open mark-99 opened 1 year ago

mark-99 commented 1 year ago

So apologies in advance for not having a minimal repro for this, but I'm seeing consistent assert fails / segfaults in ranges::nth_element(). It's being applied to a std::vector<int64_t> of size 10,000. Many of the values are zero and it's already been partially sorted previously, so most of the zeros are already at the start (positions 0-1638 are all zero in the run I'm looking at now).

This assert is firing in MSVC in debug, or it'll segfault in release: _STL_VERIFY(_Mycont->_Myfirst < _Ptr, "can't decrement vector iterator before begin");

This is the ranges code, nth_element around ln 162:

                // j points beyond range to be tested, *lm1 is known to be <= *m
                // The search going up is known to be guarded but the search coming down
                // isn't. Prime the downward search with a guard.
                if(!invoke(pred, invoke(proj, *i), invoke(proj, *m))) // if *first == *m
                {
                    // *first == *m, *first doesn't go in first part
                    // manually guard downward moving j against i
                    while(true)
                    {
                        if(i == --j)  <--- <BANG>

and there are some values in the debugger (sorted):

    j: 0x0000022512224020
first: 0x0000022512227358
    i: 0x0000022512227358
lm1  : 0x0000022512229230
last : 0x0000022512229238

So it does indeed look like j has run backwards off the start of the vector and that i == first seems like it might be a factor also, given j is pre-decremented.

Here's my source, I'm copying the input range into a local vector (as nth_element is destructive):

void foo(ranges::random_access_range auto&& rng)
//...
    std::vector<element_type> c;
    c.reserve(ranges::size(rng));
    ranges::copy(rng, ranges::back_inserter(c));
//...
    ranges::nth_element(c, nth);

Changing ranges::nth_element() to std::ranges::nth_element() and it doesn't crash anymore.

Note I'm using an old-ish version (from vcpkg) "Version: 2021-11-02", however I don't see any significant diffs in nth_element.hpp since that date.

dmm-fb commented 1 year ago

I think this is a repro:

#include <fmt/ranges.h>
#include <vector>
#include <range/v3/algorithm/nth_element.hpp>

int main() {
    std::vector<int> latencies{29, 155, 3, 2, 2, 3, 91, 2, 2, 87, 3, 2, 2};
    auto medianIt = latencies.begin() + latencies.size() / 2;
    ranges::nth_element(latencies, medianIt);
    fmt::print("{}\n", latencies);
}
mark-99 commented 1 year ago

Oh, nice, mine was much larger (10k elements) and I decided I'd put the effort in to reduce it (or upload the original large dataset somewhere) if someone showed an interest in fixing this bug, but it seems not. I read on another ticket that Eric used an rather old STL implementation for simplicity, but apparently that also came with bugs and no doubt lower performance in some areas. Given the general level of activity on this repo, we are treating ranges-v3 as abandonware now and replacing it with C++20 ranges and Sy Brand's C++23 polyfill: https://github.com/TartanLlama/ranges (which after a bit of prodding he recently made compile with clang), and will then transition to C++23 proper as compiler support appears (some things like monadic optional are already there in c++2b / c++latest so hopefully ranges won't be too long).