ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.12k stars 440 forks source link

Proposal: enumerated range #785

Closed seertaak closed 5 years ago

seertaak commented 6 years ago

I find myself doing the following quite often:

for (const auto& [i, e]: rsv::zip(rsv::iota(0, es.size()), es)) { /* use i, e */ }

This to my mind is more explicit, and leaves less room for mistakes, than:

for (auto i = 0; i < es.size(); ++i) { /* use i, es[i] */ }

If this idiom is a bad idea, I would appreciate being told why. If not, I think this would be a good candidate for addition to the library, because it is frequently the case that one wants to know the index of the item one is iterating over.

In my code, I call this enumerated(es), but I'm not particularly attached to this name.

gnzlbg commented 6 years ago

http://www.boost.org/doc/libs/1_63_0/libs/range/doc/html/range/reference/adaptors/reference/indexed.html

KindDragon commented 6 years ago

It's like Python enumerate https://stackoverflow.com/a/522578/61505

CaseyCarter commented 6 years ago

I find myself doing the following quite often:

for (const auto& [i, e]: rsv::zip(rsv::iota(0, es.size()), es)) { /* use i, e */ }

FWIW, you can abbreviate rsv::zip(rsv::iota(0, es.size()), es) as rsv::zip(rsv::indices, es) (https://wandbox.org/permlink/BACcwgpX6T36wJIA), at which point it's so concise I'm not sure there's anything to gain from giving it a name.

CaseyCarter commented 6 years ago

Compare (https://wandbox.org/permlink/1Jjruv24e6CRPPbI):

for (const auto& [i, e] : rsv::zip(rsv::indices, es))

with (https://wandbox.org/permlink/4jhh70CUy1yEX7RR):

for (const auto& [i, e] : rsv::enumerate(es))

The second saves a few keystrokes, but imposes cognitive load since I have to learn and remember the special-purpose view adaptor. The first takes longer to type, but is a simple composition of general-purpose view adaptors that I use for all sorts of things.

I suppose it's a design question of "do we want a specific name for every use case" vs. "do we want a vocabulary of basis operations that are composed for various use cases". (Apologies for doing a poor job of being objective - my bias is clear in this description.)

gnzlbg commented 6 years ago

at which point it's so concise I'm not sure there's anything to gain from giving it a name.

I think is not about the characters, but the added mental overhead. One has to know a bunch of stuff to write and read zip(indices, es), while es | enumerate can be understood by those who don't know how to use zip (or C++) yet.

I don't know how useful a single data-point can be, but in Rust everybody writes a.enumerate() even though one could write (0..).zip(a) which is shorter.

I don't think I've ever seen a single line of code doing (0..).zip(a) in the wild, but it is something that would have definitely jumped at me except for places in which one is doing a lot of range-zipping anyways - and even there I would probably just think "why doesn't this just use enumerate? Is there something that I am missing?".

The second saves a few keystrokes, but imposes cognitive load since I have to learn and remember the special-purpose view adaptor.

You are talking from the POV of a range-v3 master. A range-v3 / C++ / programming newbie could learn enumerate quickly, but one has to know quite a bit about C++ and programming to understand zip and tuples even with structured bindings.

KindDragon commented 6 years ago

I think rsv::enumerate(es) is easier to understand for newcomers

seertaak commented 6 years ago

I suppose it's a design question of "do we want a specific name for every use case" vs. "do we want a vocabulary of basis operations that are composed for various use cases".

I sympathize with this goal, but I think it's possible to apply it too narrowly with deleterious results. After all, if an orthogonal computational basis is desired, why stop at non-inclusion of enumerated? ranges::find can be written (at no extra cost in performance) in terms of the more general ranges::find_if -- should we then exclude the former? To wit: as far as functional algorithms go, reduce is all that's needed; whence find, transform, etc. are trivial to define...

Also, my feeling is that this could be a friendly motivating example for range-v3. Why use range-v3? Because you can now write for (auto [i, x]: xs | enumerate) { ... } It's so literate that no one will fail to understand or see the benefit.

I hope this doesn't come across as too forceful. One thing Bjarne's recent papers have made me aware of is that sometimes the initial creator or key contributor has a more complete and coherent vision for how a language/library/idea should be used. Perhaps I'm missing something just off the horizon.

By the way, thanks for pointing me towards indices!

ericniebler commented 6 years ago

For the record, I think the case for a dedicated enumerate view is compelling.

EDIT: Also, view::enumerate could support the pipe syntax, but view::zip cannot.

CaseyCarter commented 6 years ago

Somebody write some tests for https://wandbox.org/permlink/4jhh70CUy1yEX7RR and submit a PR.

ericniebler commented 6 years ago

I guess now the question is: should the index be signed or unsigned? :-P

gnzlbg commented 6 years ago

It should be signed, but it is going to end up being size_t.

CaseyCarter commented 6 years ago

The type of the index should be the difference type of the range being enumerated, just as I wrote it =P

EDIT: Not least because ranges may have difference types that can represent values not representable by size_t.

gnzlbg commented 6 years ago

I see that index(rng, idx) takes a difference type already, so it would be great if it that ends up being the case in the view proposal as well :)

ericniebler commented 6 years ago

So, the two choices are: difference_type_t<I> or make_signed_t<difference_type_t<I>>. From a design purity POV, yes it should be signed. But we see the blowback the committee is getting over the signed-ness of span<>::size(). There are good legacy reasons to go with unsigned. I haven't heard anyone argue why legacy doesn't matter here.

gnzlbg commented 6 years ago

I haven't heard anyone argue why legacy doesn't matter here.

For anybody compiling with reasonable warnings enabled, turning a signed integer into an unsigned one and vice-versa does not happen silently unless it's ok (e.g. turning a std::uint8_t into a long long).

This is more about 1) consistency throughout the standard, and 2) ergonomics when interfacing with legacy code.

I expect people to use the index of enumerate to index into containers all the time, e.g. things like these:

for [idx, val] : vec | ... skip front and last element... | enumerate() {
    other_vec[idx] =  0.5 * (vec[idx + 1] + 2*val - vec[idx - 1]);
}

That can be very noisy if operator[] takes an unsigned integer, and the alternatives are noisy as well AFAIK (e.g. using ranges::index or casts).

If the stl2 is allowed to make this "breaking" change, then the call would be to, at some point, offer stl2 versions of the containers that use a difference type appropriately everywhere, so that those going full stl2 don't suffer this (ideally ending up with the deprecation of the stl1 at some point).

However, when I wrote: "It should be signed, but it is going to end up being size_t." I was thinking that, if the choice is between having this as part of the views proposal with size_t, or not having it at all because of lack of consensus with respect to the type, I'll pick up using size_t and calling it a day. But that's not up to me, that's up to those presenting the paper to weight how much are they willing to fight for it. If you are determined to make this change go for it. How much does this matter to you?

ericniebler commented 6 years ago

Forget stl2. It's dead. We are stuck with the containers as they are for the foreseeable future. Knowing that unsigned indices are not going away anytime soon, does it change your answer?

gnzlbg commented 6 years ago

does it change your answer?

Not really. The pitfalls of size_t are well known and if we use size_t, then std stays consistent. If we use a difference type, then interfacing with legacy code becomes slightly annoying, and that might introduce new pitfalls, I don't know.

I am also still not 100% convinced that difference types are the types one should use for indexing. The best indexing type i've ever used represents the natural numbers and zero but traps on overflow (instead of wrapping around). Neither difference type nor size_t are that type, but I can turn size_t into that type by making unsigned integer overflow opt-in via UBSan. This is something that most people don't care about though.

Since the plan is not to progressively move towards difference type as the index type everywhere, then I don't think it would be worth the trouble to use it here, but I don't have a strong opinion either way. IMO whoever writes the proposal should just choose what they think is right and see what happens.

Yelnats321 commented 6 years ago

Not sure if this should be a separate issue, but I think it makes sense for indices to take a range as an argument and return the indices of that range.

gbrand-salesforce commented 5 years ago

The boost indexed adaptor supports an offset (i.e. first element can be whatever value - e.g. 1 instead of 0 - or crazier schemes)

Can we have that option as well, with a default at 0?

MikeGitb commented 5 years ago

Somebody write some tests for https://wandbox.org/permlink/4jhh70CUy1yEX7RR and submit a PR.

Is this still current?

CaseyCarter commented 5 years ago

Is this still current?

No one has yet written tests and submitted such a PR. (Which is what I assume you mean by "Is this still current.")

MikeGitb commented 5 years ago

Sorry, probably misused that ideom (not a native speaker). Essentially, what I meant was if the code snipped you posted is still copy/paste ready or if something inside of ranges-v3 has changed that it needs adaption.

I don't feel qualified to really work on this library, but I can do copy past and write a few lines executing the code.

CaseyCarter commented 5 years ago

Sorry, probably misused that ideom (not a native speaker). Essentially, what I meant was if the code snipped you posted is still copy/paste ready or if something inside of ranges-v3 has changed that it needs adaption.

range_difference_t should be updated to range_difference_type_t, but otherwise that snippet is still how I would write it today.

MikeGitb commented 5 years ago

Tanks, I'll make a PR

MikeGitb commented 5 years ago

Quick off-topic question: I'm not up to date with the standardization process for ranges, is there any plan to add enumerate to the standard library? At least in my code the missing "enumerate" is the number two reason why index based for loops are still used sometimes (number one is a missing zip) - and yes, in many projects we actually have our own hand-roled enumerate wrapper.

CaseyCarter commented 5 years ago

range-v3 is largely an enormous list of "things we'd like to add to the standard library sometime", but that doesn't quite qualify as a "plan". We'll probably sneak zip in just under the C++20 feature freeze, but enumerate won't make it.

KindDragon commented 5 years ago

but enumerate won't make it.

:disappointed:

MikeGitb commented 5 years ago

Sorry to hear that. I really hope we will at least get zip.

Just an Idea here: Once ranges are in the standard, it would be nice to have a lightweight version of ranges-v3 (or https://github.com/CaseyCarter/cmcstl2) that is rebased on c++20, uses the actual language concepts and provides the missing views and actions.

In any case: Thanks a lot to both of you for all your hard work (and anyone else who contributed to the ranges library and the TS).

I think this issue can be closed now - right?

CaseyCarter commented 5 years ago

I really hope we will at least get zip.

zip is in p1035 and will hopefully make 20. It's a bit messy due to needing changes to tuple to make tuple-of-references a little more referency, but will hopefully get in under the wire.

One of my remaining tasks for C++20 is to expose enough of the pipe operator machinery - which is currently specified in [range.adaptors] as magic for Standard View adaptors that "does the right thing" - so that user-defined View adaptors - such as those in range-v3 - can integrate well with compositions of Standard View adaptors.