ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.11k stars 440 forks source link

Unexpected behavior with adjacent_filter #1615

Open pparuzel opened 3 years ago

pparuzel commented 3 years ago

I was trying to implement a function one could call "sorted_duplicates" where it sorts the elements first, and then produces a vector/view of duplicated values (those that occur twice or more).

vector{1, 2, 2, 2, 3, 4, 5, 5, 5, 5} would result in vector{2, 5}

I thought adjacent_filter + unique should do the job:

  namespace views = ranges::views;
  std::vector<int> v = {1, 2, 2, 2, 3, 4, 5, 5, 5, 5, 5, 5, 6};
  auto view = v | views::adjacent_filter(std::equal_to{}) | views::unique;

To my surprise, the resulting view always contains the first element and so it looks like this: vector{1, 2, 5}.

Am I missing something or is this an unfixed bug?

JohelEGP commented 3 years ago

See https://ericniebler.github.io/range-v3/#autotoc_md8.

The first element in the source range is always included.

pparuzel commented 3 years ago

Alright, thank you for pointing that out. However, the documentation is one thing but the important thing is: how is that intuitive? It breaks the predicate rule, i.e. std::equal_to{}. Is there an alternative that does not include the first element?

What's the rationale behind this? It is a filter, so it considers only a handful of eligible elements. It is adjacent, so the filter predicate accepts values based on two adjacent elements. I don't understand.

Quuxplusone commented 1 year ago

What's the rationale behind this? It is adjacent, so the filter predicate accepts values based on two adjacent elements...

Right, the filter only accepts-or-rejects elements that are the second element of a pair of adjacent elements. The filter doesn't accept-or-reject the pair as a whole; it just uses the result of the predicate to decide whether to accept the second element. The preceding element was accepted-or-rejected based on the result of comparing it with its predecessor, and so on. This breaks when you get to the first element: it has no predecessor, so we can't apply the predicate to it. We must make an a priori decision (without consulting the predicate at all) — either "always accept the first element" or "always reject the first element."

Accepting is better than rejecting, since an accept can be turned into a reject with a simple | drop(1). To get the behavior you wanted from your example, I think this suffices:

  namespace views = ranges::views;
  std::vector<int> v = {1, 2, 2, 2, 3, 4, 5, 5, 5, 5, 5, 5, 6};
  auto view = v | views::adjacent_filter(std::equal_to{}) | views::drop(1) | views::unique;