ericniebler / stl2

LaTeX and Markdown source for the Ranges TS/STL2 and associated proposals
88 stars 8 forks source link

algorithm return types should maybe drop pair and tuple in favor of structs with more meaningful field names #23

Closed ericniebler closed 9 years ago

tomaszkam commented 9 years ago

I would prefer structures will more appropriate field names to be returned, but not if the cost of this change is elimination of ability use std::tie to capture result, i.e. write:

std::tie(f1, f2, of) = std::merge(f1, l1, f2, l2, of);

or

std::tie(f, of) = std::copy_n(f, n, of);
ericniebler commented 9 years ago

I was thinking of adding tagged_pair and tagged_tuple types, like this:

template<class...Tags>
struct tagged_tuple;

template<class Indices, class...Tags>
struct tagged_tuple_accessors;

template<std::size_t...Is, class...Tags, class...Types>
struct tagged_tuple_accessors<index_sequence<Is...>, Tags(Types)...>
  : Tags::template accessor<tagged_tuple<Tags(Types)...>, Types, Is>... { };

template<class...Tags, class...Types>
struct tagged_tuple<Tags(Types)...>
  : std::tuple<Types...>
  , tagged_tuple_accessors<make_index_sequence<sizeof...(Tags)>, Tags(Types)...> {
    using std::tuple<Types...>::tuple;
};

Then accessor structs can be defined like this:

struct input_tag {
    template<class Derived, class Type, std::size_t I>
    struct accessor {
        constexpr Type input() const {
            return std::get<I>(static_cast<const Derived&>(*this));
        }
    };
};

struct output_tag {
    template<class Derived, class Type, std::size_t I>
    struct accessor {
        constexpr Type output() const {
            return std::get<I>(static_cast<const Derived&>(*this));
        }
    };
};

You can use the type like this:

tagged_tuple<input_tag(int), output_tag(std::string)> s{42, "hello world"};
std::cout << s.input() << '\n';
std::cout << s.output() << '\n';
std::cout << std::get<0>(s) << '\n';
std::cout << std::get<1>(s) << '\n';

That prints:

42
hello world
42
hello world

It's still a tuple, so you should be able to tie, assign to tuples, etc. But you have friendly named getters.

I'm undecided, though. It might go too far.

tomaszkam commented 9 years ago

The one of the main goals of introducing concepts and STL2 was to make the error messages from instantiation readable. I am a bit concerned that using such tagged_tuple as a return type, will reintroduce the problem. Just think of error message generated when someone will accidentally invoke input() instead of input1() on result of merge. And then trying to check what methods are available.

From other side, very nice implementation of tagged tuples.

ericniebler commented 9 years ago

The error they'll get is "no such member". Code completion in their IDE will suggest the correct members. If they check the return type, they see tagged_tuple<input1(It1), input2(It2), output(Out)> or something. That's pretty good, IMO.

tomaszkam commented 9 years ago

I must admit that your solution will not make use of the algorithms less user-friendly that just naked tuple and will rather improve it.

Wouldn't use of tagged tuple resolve the problem with differentiating situation when returned pair of iterators is range (like lower_bound) or points to different ranges (mismatch). In the first case something like tagged_tuple<range(It, It)> (would require some magic to support multi element accessors, but should be doable) would be returned and models range. While for the second tagged_tuple<input1(It1), input2(It2)> would be returned.

ericniebler commented 9 years ago

tagged_tuple<range(It, It)>

More like tagged_tuple<begin(It), end(It)>

tomaszkam commented 9 years ago

tagged_tuple<begin(It), end(It)>

I proposed tagged_tuple<range(It, It)> to avoid tagged_tuple<begin(It), output(It)> etc.

ericniebler commented 9 years ago

tagged_tuple<begin(It), output(It)>

I don't get it. None of the algorithms would have that return type.

tomaszkam commented 9 years ago

I don't get it. None of the algorithms would have that return type.

I tough that tagged_tuple will be a general use class, that can be used for other purposes. In general case range(It, It) is less error prone than begin(It), end(It).

ericniebler commented 9 years ago

I tough that tagged_tuple will be a general use class

It is.

In general case range(It, It) is less error prone than begin(It), end(It)

A struct with begin() and end() member functions is error prone? You're using the wrong library if you feel that way. ;-) The nice thing about tagged_tuple<begin(It), end(It)> is that it's a range. I don't have to do anything further.

tomaszkam commented 9 years ago

A struct with begin() and end() member functions is error prone? You're using the wrong library if you feel that way. ;-) The nice thing about tagged_tuple<begin(It), end(It)> is that it's a range. I don't have to do anything further.

Providing separate begin or only end is more error prone that range that declares them in bundle, as the first one allow to declare half backed range (only end used). This has nothing to do with range concepts. In general if we require multiple functions, then I would prefer have single mixing that adds them all, not multiple ones that adds them separately.

tomaszkam commented 9 years ago

Just consider example of counted version of find_if_n that accepts Iterator, DifferenceType and Predicate and returns pair<Iterator, DifferenceType>. Would it be able to treat tagged_tuple<begin(It), count(DifferenceType)> as range? Would it be easier to do so for tagged_tuple<counted_range(It, DifferenceType)>?

ericniebler commented 9 years ago

If there were a find_if_n (there isn't), it should return a pair<Iterator, Difference> -- possibly with input and count accessors. I don't know why you think it should return a counted range.

tomaszkam commented 9 years ago

If there were a find_if_n (there isn't), it should return a pair<Iterator, Difference> -- possibly with input and count accessors. I don't know why you think it should return a counted range.

Because the returned iterator and the count defines a valid range starting at the first element meeting the predicate and ending at the end of the original range. In the same way like the iterator argument and count passed to the function defines a input range.

It the same situation as for rotate that now returns pair of iterators the new split position and end of the range. We know that this two iterators always denotes a valid range and this fact should be expressed in semantics of returned object.

ericniebler commented 9 years ago

The find family of algorithms locates a position within a range. They all return positions -- iterators. As a courtesy, find_n, if such an algorithm existed, would probably also return the distance to the end, since it needs to compute it anyway, but that doesn't change the fact that it fundamentally returns a position. Returning a range would be inconsistent with the other find algorithms.

But there is no find_n algorithm and with the existence of view::counted and range-based algorithm overloads, there isn't likely to be. Instead, find with a counted view would return a counted iterator, which is both position and count. That's the model of STL2.

The algorithms don't compose. Trying to make them compose leads to an inconsistent mess. I've tried; it doesn't work.

tomaszkam commented 9 years ago

Returning a range would be inconsistent with the other find algorithms.

The explicitly counted algorithm already have inconsistent signatures. Their accept different arguments and have different result types.

The algorithms don't compose. Trying to make them compose leads to an inconsistent mess. I've tried; it doesn't work.

My point is not about the composability, but about having the return type that conveys as much of semantics information as possible. If pair of the iterators happens to models range, the return type should model range concept (mismatch vs rotate). If the iterator and count denotes valid range, then returned type should also model a range (count_if* versus find_if_n).

*Why does count_if return only DifferenceType and drops the information about the end iterator that's need to be computed anyway?

But there is no find_n algorithm and with the existence of view::counted and range-based algorithm overloads, there isn't likely to be. Instead, find with a counted view would return a counted iterator, which is both position and count. That's the model of STL2.

For the original STL design the user was able to define their own STL-like algorithm. Writing an implementation of the algorithm that would work effectively for both bounded and counted ranges is not easy as explained in one of your blog post (at least that was my reception). So I think we should support the explicit counted version (_n) of the algorithm if they are not able to create efficient generic version. By this support I mean returning a tuples that includes additional semantics, if we provide such functionality for standard algorithms.

asutton commented 9 years ago

My point is not about the composability, but about having the return type that conveys as much of semantics information as possible. If pair of the iterators happens to models range, the return type should model range concept (mismatch vs rotate). If the iterator and count denotes valid range, then returned type should also model a range (count_if* versus find_if).

*Why does count_if return only DifferenceType and drops the information about the end iterator that's need to be computed anyway?

Because an algorithm does only what it is designed to do and no more. We should not try to alter the design of existing algorithms. Also, in this case, returning and end iterator from count_if would make the algorithm require forward iterators.

Andrew

tomaszkam commented 9 years ago

Also, in this case, returning and end iterator from count_if would make the algorithm require forward iterators.

This is probably wrong place to ask such question. But why returning an iterator from the count would change the requirement to ForwardIterator. The for_each algorithm accept InputIterators and returns the end iterator. count/count_if may be implemented as invocation of for_each with functor that will conditionally increase an internal counter. As the functor is returned from iterator, I can extract the counter value after iteration.

Because an algorithm does only what it is designed to do and no more. We should not try to alter the design of existing algorithms.

I think that this equally applies to for_each and count_if, but only one of them was changed.

ericniebler commented 9 years ago

You can find the rationale in my first range paper. Basically, if the whole point if the algorithm is to compute one value, I opted not to change the return type. My thinking is that it would be too inconvenient and make migration difficult. Nobody uses the return value of for_each, but everybody necessarily uses the return value of count_if.

On Thu, May 28, 2015, 1:02 PM tomaszkam notifications@github.com wrote:

Also, in this case, returning and end iterator from count_if would make the algorithm require forward iterators.

This is probably wrong place to ask such question. But why returning an iterator from the count would change the requirement to ForwardIterator. The for_each algorithm accept InputIterators and returns the end iterator and count_if may be implemented as invocation of for_each with function that will conditionally increase a internal counter.

Because an algorithm does only what it is designed to do and no more. We should not try to alter the design of existing algorithms.

I think that equally applies to for_each and count, but one of them was changed.

— Reply to this email directly or view it on GitHub https://github.com/ericniebler/stl2/issues/23#issuecomment-106581602.

tomaszkam commented 9 years ago

I think I would be better should stick for every case to The Law of Useful Return as defined in From Mathematics to Generic Programming:

If you've already done the work to get some useful result, don't throw it away. Return it to the caller.

A procedure should return all the potentially useful information it computed.

This would avoid making arbitrary decisions, like deciding that changing copy will be less breaking than count.

Would it be possible to move this comments to separate issue. We are moving a bit off topic from original issue.

CaseyCarter commented 9 years ago

The traditional hack to recover the end iterator from an STL algorithm is to pass reference_wrapper<I> instead of the actual iterator(s). It would be nice if STL2 would officially endorse this technique for cases like this. reference_wrapper<I> falls afoul of the exact type constraints for ++WeaklyIncrementable and Incrementable++, so it's not as simple as writing specializations for the associated type templates. It would not be challenging to write an iterator_reference_wrapper or add constrained increment operators to reference_wrapper so that it could serve this role.

tomaszkam commented 9 years ago

It would be not challenging to write an iterator_reference_wrapper or add constrained increment operators to reference_wrapper so that it could serve this role.

I think that use of iterator_reference_wrapper that has pointer copy semantics, will break all algorithms that assume that copies of the iterators are independent, i.e. if not touched does not change their positions.

Are the requirements that copies are independent (i.e. modification of the copy does not change source) stated anywhere? From the non normative note from SemiRegular:

The Semiregular concept is modeled by types that have behave similarly to built-in types like int, except that they may not be comparable with ==.

It looks like is independence of copies is required for the Semiregular (as built-in types provides such semantics), but I am unable to find anything in the wording that will explicitly state that requirement.

ericniebler commented 9 years ago

Are the requirements that copies are independent (i.e. modification of the copy does not change source) stated anywhere?

It's the difference between WeaklyIncrementable and Incrementable that matters in this case. A iterator_reference_wrapper would necessarily be Input, I think.

ericniebler commented 9 years ago

This would avoid making arbitrary decisions, like deciding that changing copy will be less breaking than count.

You're right, it's arbitrary, and it makes me uncomfortable. But making people do more work to get the result they asked for also makes me uncomfortable. We don't have the luxury of designing STL2 in a bubble ... there is a huge installed base, and ease of migration is a concern.

It's a question that should be put to the committee, certainly.

ericniebler commented 9 years ago

My point is not about the composability, but about having the return type that conveys as much of semantics information as possible.

OK, this is a fair point. I'll think about this. My gut is telling me that tagged_tuple<counted_range(It, Diff)> is not the right interface, though.

CaseyCarter commented 9 years ago

We don't have copy independence specified anywhere, and we probably should: input/output iterators do not have copy independence, stronger iterators do. I'm sure the algorithms require it, so it needs to be specified at least in ForwardIterator. I imagine most of the permuting algorithms probably require copy independence for the ValueType as well...

Adding copy independence to Regular or Semiregular would break input/output iterators be a good idea.

tomaszkam commented 9 years ago

We don't have copy independence specified anywhere, and we probably should: input/output iterators do not have copy independence, stronger iterators do.

As Eric pointed out, the Incrementable types requires that increment is equality-preserving. This cannot be achieved if the iterator does not provide copy-independence. If a is modifies state of b then ++a would not be equal ++b (shared state incremented twice).

asutton commented 9 years ago

Because count* iterates over the entire range. Returning an iterator into the middle of the range requires forward iterators. On May 28, 2015 4:02 PM, "tomaszkam" notifications@github.com wrote:

Also, in this case, returning and end iterator from count_if would make the algorithm require forward iterators.

This is probably wrong place to ask such question. But why returning an iterator from the count would change the requirement to ForwardIterator. The for_each algorithm accept InputIterators and returns the end iterator and count_if may be implemented as invocation of for_each with function that will conditionally increase a internal counter.

Because an algorithm does only what it is designed to do and no more. We should not try to alter the design of existing algorithms.

I think that equally applies to for_each and count, but one of them was changed.

— Reply to this email directly or view it on GitHub https://github.com/ericniebler/stl2/issues/23#issuecomment-106581602.

tomaszkam commented 9 years ago

Because count* iterates over the entire range. Returning an iterator into the middle of the range requires forward iterators.

The count* would necessarily return the end iterator, just as for_each. And I am asking why it is does not returned, because after addition of Sentinel concept the value of the end iterator may not be know at the caller side, so it counts as potentially useful information. I do not see how returning iterator to the middle of the range get involved.

Adding copy independence to Regular or Semiregular would break input/output iterators.

If we are not able to iterate multiple times over the range, why do we even require a copy operations from input/output iterator. Eliminating ability to use copy of iterator to store a position and perform second iteration, effectively excludes multiple passes. And we can still move this iterators around using move-operations. Maybe WeaklyIncrementable types should not be required to be copy constructible and SemiRegular.

The copy constructor of std::istream_iterator behaves similarly to std::auto_ptr - after copy only one of the objects may be used. Maybe similar shift like the one in case of std::unique_ptr should be also applied here?

tomaszkam commented 9 years ago

You're right, it's arbitrary, and it makes me uncomfortable. But making people do more work to get the result they asked for also makes me uncomfortable. We don't have the luxury of designing STL2 in a bubble ... there is a huge installed base, and ease of migration is a concern.

There is also other side of the problem. Returning the end iterator from every algorithm that needs to compute it anyways, simplifies the invoking STL1 functions in the code that uses STL2. In case when standard algorithm was invoked before, because it eliminates need to use common_iterator to pass a range defined by Iterator and Sentinel, as the user already has an end iterator returned from STL2 algorithm.

asutton commented 9 years ago

Because count* iterates over the entire range. Returning an iterator into the middle of the range requires forward iterators.

The count* would necessarily return the end iterator, just as for_each. And I am asking why it is does not returned, because after addition of Sentinel concept the value of the end iterator may not be know at the caller side, so it counts as potentially. I do not know how returning iterator to the middle of the range get involved.

Oh right... this is generalized for sentinels. I was thinking of the current versions of count*.

I suppose we should aim for consistency here.

That said, I don't like that count doesn't return just a number, since that seems to be the common case. I wonder if we can overload count for iterator pairs to recover the original case (without getting ambiguities).

tomaszkam commented 9 years ago

That said, I don't like that count doesn't return just a number, since that seems to be the common case. I wonder if we can overload count for iterator pairs to recover the original case (without getting ambiguities).

But then the same should be done for copy and possibly all other algorithms that has changed return type.

I believe that return type should not be affected by category of input type, this both covers traversal (RandomAccess vs Bidirectional) and the type of end position (Iterator versus Sentinel). Otherwise writing more general algorithms (binary_search) using basic ones (advance) would be a harder, because such special cases would need to be detected and addressed.

Maybe an options would be to go with more meaningful return types (tagged_tuple<count(DT), input(It)> instead of pair<DT, It>) and provide implicit conversion to the main result (ie. number of occurrences for count, output iterator for copy). However I am not sure if this will cause problems for new STL2 code.

ericniebler commented 9 years ago

provide implicit conversion to the main result (ie. number of occurrences for count, output iterator for copy)

All my experience as a library developer tells me to avoid this. Implicit conversions are trouble. This subterfuge will not go unnoticed. People will simply assign the result of count to an auto variable, or pass it to a (constrained?) function template where the type is deduced as a pair or tuple and hell breaks loose.

ericniebler commented 9 years ago

I wonder if we can overload count for iterator pairs to recover the original case (without getting ambiguities).

You mean, return a plain count when count is called with iterator/iterator, and a pair<count, end> when called with iterator/sentinel. This complicates generic code calling count on a range that was passed in. Now I need to do metaprogramming just to use the return value of an algorithm!