Open lichray opened 8 years ago
An alternative design could be, requiring the input range to be a heap and then forget about storing the iterators, and call the view view::sorted_heap
.
I like this idea very much, since I usually end up writing my own sorted view using a sequence of std::reference_wrapper<T>
s. Many algorithms in computational geometry use sorting a sequence as a preprocessing step (convex hull algorithms for example).
@lichray
View
s are cheap to copy, but a data-structure with O(N) storage is not cheap to copy.
An alternative design could be, requiring the input range to be a heap and then forget about storing the iterators, and call the view view::sorted_heap.
In the Library Fundamentals TS there is a string search algorithm that uses "searchers" as input state. Maybe having an accompanying data-structure that the user needs to constructs and can get passed by reference to a view::sorted
could help here. This data-structure would not be a Container
nor a View
, but I have no idea how the "searchers" mesh with ranges and views, if at all. @CaseyCarter @ericniebler ?
@grtlr
As you mention, in many computational geometry applications the sequences are sorted as a pre-processing step to accelerate some other algorithm. In which application are you needing to do that sorting lazily during the algorithm execution instead of eagerly in a pre-processing step? (AFAIK this is not needed to compute the convex hull nor any other common problem of computational geometry that I know of).
Lazily sorting a sequence is useful when you don't know how many things you'll want to sort at the beginning, for example because that depends on the contents of the range. In my case, for example, I have a sequence of elements that I want to deduplicate (these elements are not strictly equal, but they are equivalent with respect to some property), and I want to return only so many of them, and in sorted order. I can't just partial_sort
everything, because I don't know how many results I need upfront (I need to deduplicate on the fly).
This functionality is very useful, and it's okay if it needs some state on the side to operate (I understand the complexity requirements of being a view), but it would be lovely if there was a way of doing it.
Today, we either do the heap trick ourselves (but it breaks the nice pipeline), or we basically guess a number of elements to partial sort, and then just pull out more stuff out of the pipeline and partial sort some more if we don't have enough results to fulfill our needs.
This functionality is very useful, and it's okay if it needs some state on the side to operate (I understand the complexity requirements of being a view), but it would be lovely if there was a way of doing it.
Wouldn't storing a shared_ptr<vector<T>>
solve this issue? (in a similar spirit to view::shared
?)
That would allow for copy to remain O(1)
.
Wouldn't storing a shared_ptr<vector
> solve this issue? (in a similar spirit to view::shared?) That would allow for copy to remain O(1).
I mocked up a sorted_view
view adaptor the last time this came up (https://github.com/CaseyCarter/range-v3/commit/de85c10821a695fc207a7135ea2554ec2a487c94), which side-steps the issue of storage. Yes, if you have a temporary container you could put it in a shared_view
, if you have a non-temporary container you could operate on it directly, the sorted_view
adaptor only cares that it can permute the contents of the underlying view. sorted_view
accepts a projection, so it can just as easily sort a range of iterators by projecting them through a dereference function. Achieving O(1) copy is not the only issue.
Even ignoring the storage for the underlying view, this approach still has some complexity concerns:
make_heap
over the underlying view on the first begin
call is worst case O(N), which amortizes across a full iteration pass to O(1). This achieves the proper mathematical amortized O(1) but in practice blows up your caches (some ones are bigger than others).
The cost of sorting on-demand on the first pass through the range is O(lg N) per element. To achieve O(1) for iterator operations as required, this cost needs to be amortized across O(lg N) iterations.
This range may be useful, but I think it's a bit of a stretch to call it a view.
My suggestion is that rather than making it entirely lazy, we defer an appropriate amount of work; they are "optional" to the consumer of the view.
The input would be
RandomAccessRange
, outputtingInputRange
s.The view maintains a sequence of iterators pointing to each element in the input range, the
begin
iterator of the view contains an iterator pointing to theend
of the sequence.The view makes the sequence into a min heap view to the input range upon construction. O(n)
Incrementing the view iterator
pop_heap
the min heap and decrement the contained iterator O(lg n). Dereferencing the view iterator dereferences the container iterator twice.This requires O(n) storage for the iterator sequence.
view::partially_sorted
orview::sorted_until
, if we like, can reduce the storage to practically constant, but still using min heap.