ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.15k stars 439 forks source link

view::sorted is possible and could be useful #266

Open lichray opened 8 years ago

lichray commented 8 years ago

My suggestion is that rather than making it entirely lazy, we defer an appropriate amount of work; they are "optional" to the consumer of the view.

The input would be RandomAccessRange, outputting InputRanges.

The view maintains a sequence of iterators pointing to each element in the input range, the begin iterator of the view contains an iterator pointing to the end of the sequence.

The view makes the sequence into a min heap view to the input range upon construction. O(n)

Incrementing the view iterator pop_heap the min heap and decrement the contained iterator O(lg n). Dereferencing the view iterator dereferences the container iterator twice.

This requires O(n) storage for the iterator sequence. view::partially_sorted or view::sorted_until, if we like, can reduce the storage to practically constant, but still using min heap.

lichray commented 8 years ago

An alternative design could be, requiring the input range to be a heap and then forget about storing the iterators, and call the view view::sorted_heap.

grtlr commented 7 years ago

I like this idea very much, since I usually end up writing my own sorted view using a sequence of std::reference_wrapper<T>s. Many algorithms in computational geometry use sorting a sequence as a preprocessing step (convex hull algorithms for example).

gnzlbg commented 7 years ago

@lichray

Views are cheap to copy, but a data-structure with O(N) storage is not cheap to copy.

An alternative design could be, requiring the input range to be a heap and then forget about storing the iterators, and call the view view::sorted_heap.

In the Library Fundamentals TS there is a string search algorithm that uses "searchers" as input state. Maybe having an accompanying data-structure that the user needs to constructs and can get passed by reference to a view::sorted could help here. This data-structure would not be a Container nor a View, but I have no idea how the "searchers" mesh with ranges and views, if at all. @CaseyCarter @ericniebler ?

@grtlr

As you mention, in many computational geometry applications the sequences are sorted as a pre-processing step to accelerate some other algorithm. In which application are you needing to do that sorting lazily during the algorithm execution instead of eagerly in a pre-processing step? (AFAIK this is not needed to compute the convex hull nor any other common problem of computational geometry that I know of).

ldionne commented 6 years ago

Lazily sorting a sequence is useful when you don't know how many things you'll want to sort at the beginning, for example because that depends on the contents of the range. In my case, for example, I have a sequence of elements that I want to deduplicate (these elements are not strictly equal, but they are equivalent with respect to some property), and I want to return only so many of them, and in sorted order. I can't just partial_sort everything, because I don't know how many results I need upfront (I need to deduplicate on the fly).

This functionality is very useful, and it's okay if it needs some state on the side to operate (I understand the complexity requirements of being a view), but it would be lovely if there was a way of doing it.

Today, we either do the heap trick ourselves (but it breaks the nice pipeline), or we basically guess a number of elements to partial sort, and then just pull out more stuff out of the pipeline and partial sort some more if we don't have enough results to fulfill our needs.

gnzlbg commented 6 years ago

This functionality is very useful, and it's okay if it needs some state on the side to operate (I understand the complexity requirements of being a view), but it would be lovely if there was a way of doing it.

Wouldn't storing a shared_ptr<vector<T>> solve this issue? (in a similar spirit to view::shared?)

That would allow for copy to remain O(1).

CaseyCarter commented 6 years ago

Wouldn't storing a shared_ptr<vector> solve this issue? (in a similar spirit to view::shared?)

That would allow for copy to remain O(1).

I mocked up a sorted_view view adaptor the last time this came up (https://github.com/CaseyCarter/range-v3/commit/de85c10821a695fc207a7135ea2554ec2a487c94), which side-steps the issue of storage. Yes, if you have a temporary container you could put it in a shared_view, if you have a non-temporary container you could operate on it directly, the sorted_view adaptor only cares that it can permute the contents of the underlying view. sorted_view accepts a projection, so it can just as easily sort a range of iterators by projecting them through a dereference function. Achieving O(1) copy is not the only issue.

Even ignoring the storage for the underlying view, this approach still has some complexity concerns:

  1. make_heap over the underlying view on the first begin call is worst case O(N), which amortizes across a full iteration pass to O(1). This achieves the proper mathematical amortized O(1) but in practice blows up your caches (some ones are bigger than others).

  2. The cost of sorting on-demand on the first pass through the range is O(lg N) per element. To achieve O(1) for iterator operations as required, this cost needs to be amortized across O(lg N) iterations.

This range may be useful, but I think it's a bit of a stretch to call it a view.