ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.15k stars 438 forks source link

`const`-ness of view operations #385

Closed CaseyCarter closed 5 years ago

CaseyCarter commented 8 years ago

Containers

Containers in C++ may have both const and non-const overloads of begin. They differ only in their return types: non-const begin returns a mutable iterator, and const begin a constant iterator. Neither overload changes the bits of the container representation or modifies the semantic value of the container object. Two overloads exist so that it is necessary to establish a non-const access path to the container object in order to obtain a mutable iterator. end is the same. size is always const because it provides no way to modify the container's content.

Views

Views in range-v3 may have both const and non-const overloads of begin/end/size (herein termed "operations"). Views have pointer semantics - a view is essentially a pointer to a sequence of elements - so mutability of the elements viewed is orthogonal to mutability of the view object itself. The const distinction here has no relation to that of containers. Non-const operations do not modify the semantic objects being viewed, nor do they "swing the pointer" so that the same view designates different semantic objects. Non-const operations mutate internal state that does not contribute to the semantic value of the view; the const-ness here is purely bitwise.

The const-ness model used by views makes view composition painful. You can always provide the non-const overloads, but const overloads are preferred when achievable. So a composer, e.g.:

template<typename View>
class wrap_view : view_facade<wrap_view<View>>
{
    View base_;
public:
    wrap_view() = default;
    wrap_view(View v)
    : base_(std::move(v))
    {}
    // ...
};

ends up providing two definitions of each operation: one const that's constrained to require const operations over the underlying view(s):

    // We need the dependent type V so that range_iterator_t<V const> doesn't hard
    // error when View has no const begin operation (or: use C++14 deduced return type).
    template <typename V = View, CONCEPT_REQUIRES_(Range<View const>())>
    range_iterator_t<V const> begin() const
    { return ranges::begin(base_); }

and one mutable that's constrained to only be available when the const version isn't:

    CONCEPT_REQUIRES(!Range<View const>())
    range_iterator_t<View> begin()
    { return ranges::begin(base_); }

Ranges

I'm concerned that the differing const distinctions for containers and views don't mesh well into a consistent notion of what const means for operations on general ranges. I see a potential for latent bugs where a programmer accustomed to the fact that calling begin/end on mutable containers is threadsafe calls begin/end on mutable ranges without realizing there are sharp corners here.

The only mutating operation on pointers is assignment. If views are supposed to be range-pointers, perhaps assignment should be the only mutating operation? We (I) need to investigate an alternative model where view operations are always const and perform internal synchronization if needed.

gnzlbg commented 8 years ago

I see a potential for latent bugs where a programmer accustomed to the fact that calling begin/end on mutable containers is threadsafe calls begin/end on mutable ranges without realizing there are sharp corners here.

I agree with the tone of this RFC, these are real issues.

We (I) need to investigate an alternative model where view operations are always const and perform internal synchronization if needed.

I think that views are already expensive, and that the serial case is an important one, so my feelings are currently against a model that requires internal synchronization to implement a conforming view. Still, this hasn't been investigated yet and maybe (hopefully?) isn't as bad as I expect it to be.

There is another issue. While views are not thread-safe in general, they are (or should be) cheap to copy, so the thread-safety issues that might arise are from users inadvertently sharing views between threads instead of copying them (which is what users should be doing) [*] . This is another point in which views differ from containers (which are expensive to copy and might be worth sharing). Views should almost never be worth sharing and by making them thread-safe we might be encouraging the wrong usage patterns.

Sometimes one might really want to share a view, but in those cases one wants to perform sharing explicitly as opposed to by mistake, so I think thread-safe versions of some of the views (something that you are not proposing) could be really useful even though having the world split might impact re-usability (e.g. can't reuse view::a | view::b | view::c because view::b is/isn't thread-safe).

Is there a list somewhere of which views are not thread safe? (e.g. probably all the SinglePass views but maybe others as well).

[*] Maybe the cppguidelines could in the future improve the safety issues by allowing marking some views as not-thread safe in the spirit of rust Send<T> and Sync<T> traits but that's a long shot.

ilyapopov commented 8 years ago

I quickly searched the code and I discovered that box.hpp and stride.hpp already use atomics for some synchronization. I find that surprising. Could anyone comment what is rationale behind that? Is not we paying for what we don't normally use?

PS seems like this is for some mutable members which, where the user would assume thread-safety because of const interface.

ericniebler commented 8 years ago

I'm concerned that the differing const distinctions for containers and views don't mesh well into a consistent notion of what const means for operations on general ranges. I see a potential for latent bugs where a programmer accustomed to the fact that calling begin/end on mutable containers is threadsafe calls begin/end on mutable ranges without realizing there are sharp corners here.

The current model is: either the range has visible mutable state and does not have const begin()/end(), or else it guards any internal mutation so as to be thread-safe. I agree it's not ideal, but I haven't come up with anything better.

I quickly searched the code and I discovered that box.hpp and stride.hpp already use atomics for some synchronization.

It's so that view::stride can have an amortized O(1) begin()/end() by caching some data rather than recomputing it every time. The cache variables are guarded by atomics. The alternative would be to make begin()/end() non-const. I considered the atomic to be the lesser evil, but I am open to other views (har har).

The only mutating operation on pointers is assignment. If views are supposed to be range-pointers, perhaps assignment should be the only mutating operation? We (I) need to investigate an alternative model where view operations are always const and perform internal synchronization if needed.

Ack. Baby, meet bathwater. Some views simply cannot be const, logical or physical; an istream view, for instance. Unless you want to disallow such a view (I don't), it seems to me that the problem must be addressed.

ericniebler commented 8 years ago

One simplification is that no views have const begin()/end() members. That way nobody has to think about it. Seems like a fool's consistency to me, though.

CaseyCarter commented 8 years ago

This implementation of istream_range:

template<typename Val>
struct istream_range
  : view_facade<istream_range<Val>, unknown>
{
private:
    friend range_access;
    std::istream *sin_;
    mutable semiregular_t<Val> obj_;
    struct cursor
    {
    private:
        istream_range const *rng_;
    public:
        cursor() = default;
        explicit cursor(istream_range const &rng)
          : rng_(&rng)
        {}
        void next()
        {
            rng_->next();
        }
        Val &get() const noexcept
        {
            return rng_->cached();
        }
        bool done() const
        {
            return !*rng_->sin_;
        }
        Val && move() const noexcept
        {
            return detail::move(rng_->cached());
        }
    };
    void next() const
    {
        *sin_ >> cached();
    }
    cursor begin_cursor() const
    {
        return cursor{*this};
    }
    Val & cached() const noexcept
    {
        return obj_;
    }
    istream_range(std::istream &sin, Val *)
      : sin_(&sin), obj_{}
    {}
    istream_range(std::istream &sin, semiregular<Val> *)
      : sin_(&sin), obj_{in_place}
    {}
public:
    istream_range() = default;
    istream_range(std::istream &sin)
      : istream_range(sin, _nullptr_v<semiregular_t<Val>>())
    {
        next(); // prime the pump
    }
};

has const begin & end, and meets the thread-safety requirements of [res.on.data.races]. Note that cursor::next cannot race with get/done/move: input iterators are not required to be dereferenceable after increment, so a program in which two threads simultaneously increment and dereference input iterators with the same value has undefined behavior.

ericniebler commented 8 years ago

That's interesting, but it feels a bit dirty. I don't like const members that mutate the object in a visible way, especially if it's unsynchronized. I'm not particularly swayed by the argument that we've defined things such as to permit it.

CaseyCarter commented 8 years ago

I disagree with the claim that there are " const members that mutate the object in a visible way." No such mutations are observable in a program with defined behavior. If that guarantee is strong enough for the C++ memory model, surely it suffices for istream_range.

Do you disagree with the semantics we've defined for views and input iterators that make this implementation valid, or do you disagree with the results this implementation produces for programs with undefined behavior?

ericniebler commented 8 years ago

Do you disagree with the semantics we've defined for views and input iterators that make this implementation valid, or do you disagree with the results this implementation produces for programs with undefined behavior?

Neither. I'm expressing my distaste for using up all the wiggle room we've given ourselves wrt input ranges. I don't think I disagree that this is allowed. And it probably should be. But man does it feel wrong.

EDIT: I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code. This is not that.

ericniebler commented 8 years ago

For the record, this issue interests me greatly. The interaction of the type system with views is very important to get right. I would love to hear a better suggestion, but so far I favor the current design as warty and surprising as it can be. Perhaps we look for a library solution that guides people away from the warts and toward doing the right thing, like providing the const-correct begin/end overloads for them, depending on whether their range has to cache mutable state. ??? I'm spitballing here. Open to other suggestions.

EDIT: One of the reasons view_adaptor exists, as crummy as it is, is because it makes it easier to get this const business right.

gnzlbg commented 8 years ago

I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code.

What exactly do you mean here? Iterators are only dereferenceable after increment if it < end() right? So all iterators kind of have this problem. Maybe that if I have two copies of InputIterators and increment one the other is invalidated?.

gnzlbg commented 8 years ago

If that is the case the problem is IMO that dereferencing doesn't mean the same thing for InputIterator than for other iterators, and it looks to me that we got ourselves into this hole by calling it the same thing. A solution could then be to call it something else .get() and for example return an optional.

This is far from perfect, breaks the iterator hierarchy, and optional is not free, but it could make things safe by forcing the users to check if they got a value from an InputIterator or not.

EDIT: going this far is IMO not worth the trouble, and making InputIterators return a range is too weird.

ericniebler commented 8 years ago

Maybe that if I have two copies of InputIterators and increment one the other is invalidated?).

This.

If that is the case the problem is IMO that dereferencing doesn't mean the same thing for InputIterator than for other iterators, and it looks to me that we got ourselves into this hole by calling it the same thing.

If by "we" you mean the original designers of the STL then I agree. It's too late to change it now. The question is, what hand can we make of the cards we've been dealt? Casey thinks he can pull an inside straight (stretching the metaphor), I think we should settle for a std pair.

CaseyCarter commented 8 years ago

Dereference is fine, really, it's increment that causes the insanity. Increment has spooky action-at-a-distance in that it renders copies invalid. Such action-at-a-distance is anathema to our ability to reason locally about program behavior.

EDIT: I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code. This is not that.

Yes, and that ship got on the bus and left the train station long ago. We can't bolt safety onto single-pass iterators now. What we can do is suggest implementation techniques that exploit the undefined behavior to provide at least some error detection. An input range that detects repeated calls to begin, or late calls to size, or use of invalid iterators adds more realistic safety than an input range that has mutable size or begin.

And yes, I admit I'm doing a lot more complaining about problems in this thread than discussing solutions. My goal was not to belittle the current use of const/mutable for views, so much as to point out some things I think need improvement and see of others agree.

mgaunard commented 8 years ago

I don't think const should affect the behaviour of views. If you want to cache state, using mutable is fine.

This issue took me unexpectedly and made me lose a lot of time. Had I known this beforehand, I might have stuck with boost range v2 instead.

ericniebler commented 8 years ago

If you want to cache state, using mutable is fine.

Possibly for Input ranges, but how could it be fine to mutate a const object that is possibly being read concurrently from different parts of your program? Guarding against that kinds of misuse is exactly what const is for.

Had I known this beforehand, I might have stuck with boost range v2 instead.

Seriously? Be my guest.

mgaunard commented 8 years ago

Boost.Iterator/Boost.Range are fairly simple both in concept and implementation, are well documented, well understood, mature, and I happen to have a lot of experience with it. I chose to use Range v3 for a toy project since it was a chance to evaluate it, but given the minimal documentation, the amount of complexity I ran into, and the open major design issues like this one, it might appear that it's not quite ready. Now that is but an opinion, but it's based on somewhat unquestionable facts, so I don't think being so dismissive was justly deserved.

Anyway, I'm not quite used to the view/cursor dichotomy as opposed to the range/iterator one yet, but couldn't all state go into the iterator rather than the view, removing all needs for synchronization?

gnzlbg commented 8 years ago

I think maybe the library needs a FAQ because @mgaunard questions are fair for new users to have, and come up very often. New users seem to have two issues with the library:

So because they don't know this, and some of them come with a Boost.Range v2 background, they first try to make everything const, and "clash" against the library. Then they ask themselves why can't I make a view const? It has no state! And because they don't know the second point, they end up very confused.

@mgaunard

views are stateful algorithms by definition

This is how range-v3 is designed. While some views don't store state, most do. Don't make them const. This might be hard if you are used to "make everything const", but ask yourself, does this stateful algorithm really need to be const here?

In range-v3, begin / end are amortized O(1)

That is, if you do:

auto v = range | view::remove_if(pred);
for (auto&& i : v) { ... }  // begin/end are amortized O(1) here
for (auto&& i : v) { ... }  // begin/end are amortized O(1) here as well

The way this is implemented is that some views cache the begin/end on construction, so that all the calls to begin/end from that point on are O(1). This requires state, so vies that do this cannot be const (@ericniebler: I don't find this in the design paper, have you written about this somewhere besides in the issues? I guess this also would belong to a FAQ).

@mgaunard This is just a guarantee that range-v3 views offer you. If you don't need this/want this, range-v3 offers you the tools to build your own view::remove_if_mgaunard that does not cache any internal state, that offers O(N) begin and end, and can be made const.

But think about it thrice before going down that route. What do you win? The ability to make a view const? For what? Is that worth it? What do you loose? (O(1) begin/end)

ericniebler commented 8 years ago

From @CaseyCarter in #428 :

remove_if caches the value of the first "non-removed" iterator in the range so as to amortize the cost of multiple calls to begin since begin is supposed to be O(1). This is odd given that next and prev are not O(1) and more likely to be called frequently and repeatedly

The difference between begin and next is that begin is assumed to O(1) is so many places. Consider this code from reverse_view's next member:

if(0 != ranges::advance(it, -1, ranges::begin(rng_->mutable_base())))
    it = rng_->get_end_(BoundedRange<Rng>());

If begin is O(M), then next becomes O(M). That's bad.

For ranges like filter_view, next is permitted to be O(M) so long as traversing the entire range is O(N). next is more expensive, but it must be called fewer times to traverse the entire range.

begin and end get called frequently in the implementation of next and prev of many other range adaptors. The alternative is to have the adaptor's iterators cache the underlying range's begin and end iterator. That makes iterators fat -- a problem that compounds as adaptor chains grow -- and spreads the cost out as begin and end get recomputed over and over.

The design compromise that is currently implemented makes begin/end always amortized O(1) so it is safe to be called from a loop where it is used in expressions that test for begin- or end-of-range. The win is that iterators stay small and the beginning and end of the range, if it is computed, gets computed only once, and that result can be reused over and over. The downside, as you've noticed, is that ranges become stateful.

I'm sure you can find examples in the other views where next and prev get called in loops in a way that make the algorithms O(N*M). Caching all the positions returned by next and prev is obviously unfeasible. I made begin and end different (a) because they so frequently appear in end-of-range checking expressions, and (b) there are only two such positions, so caching them is feasible.

So, here are the alternatives I see:

  1. Status quo: Ranges may lack const begin/end members and are permitted to cache data from begin/end without synchronization overhead.
  2. Any cached state must be computed on construction/assignment. View move/copy/assign becomes O(N).
  3. Caching from begin/end in non-Input ranges is verboten. Ranges like filter_view can either satisfy Forward or have O(1) begin/end members, but not both. Caching Input ranges have const begin/end members, increasing the likelihood they will be used unsafely.
  4. Ranges that want to be Forward, cache state from begin/end, and have const begin/end must guard the mutation with synchronization.
  5. All views and cursors must be implemented with the assumption that begin/end are O(N). Cursors that compare an iterator with the beginning or end of a range must store those positions internally. Iterator bloat becomes a problem that compounds as adaptor chains grow.

Any other options I missed?

<spitballing> I can imagine a separate range adaptor that caches the underlying range's begin/end on construction. With C++17's guarantees about copy elision, we can investigate the possibility of using this either internally or letting users use it directly. That requires that let filter_view have O(N) begin. That doesn't give our users much guidance to know when such a caching adaptor would be needed.

pfultz2 commented 8 years ago

The difference between begin and next is that begin is assumed to O(1) is so many places. Consider this code from reverse_view's next member:

Why wouldn't you just do --it? In the deference oprerator, it would require one iteration back but that seems better than all that branching.

ericniebler commented 8 years ago

(Drifting off-topic):

Why wouldn't you just do --it? In the deference oprerator, it would require one iteration back but that seems better than all that branching.

This implementation of reverse_view avoids LWG#198.

mgaunard commented 8 years ago

Was this design aspect (constness drops range status of certain views) discussed in WG21 yet?

ericniebler commented 8 years ago

constness drops range status of certain view

With the current design, const does not effect the strength of the concept that a range satisfies. So no, it has not been discussed in WG21.

ericniebler commented 8 years ago

So, here are the alternatives I see: [...] Any other options I missed?

@CaseyCarter Thoughts? I'm going to start work on Range TS2 "Views Edition" soon/eventually, and as of right now, I haven't heard a better alternative than the current design. I can play with improved designs for range_adaptor that do more of this work for the user, but the fundamental model would be unchanged.

gnzlbg commented 8 years ago

I ran into this when trying to improve a bit the views part of the range-v3 documentation. Describing them as "lazy range combinators" doesn't give users enough information to avoid common questions and mistakes, so I was trying to come up with a slightly longer description that explains the semantics a bit more and covers some pitfalls.

Needless to say, I gave up since the best I could come up with was this:

Views are:

  • range algorithms: from some input (which might be a range), they produce a range of values,
  • composable: views producing and accepting ranges can be chained to construct more complex algorithms,
  • lazy: the sequence of values produced by a view is computed on demand as the view is iterated over, every time the view is iterated over
    • views present the trade-off "compute every time, no extra memory", while the opposite tradeoff, "compute once, store in memory" is provided by actions. Both views and actions can be composed.
  • non-owning: in general, views do not own their input, which makes them "cheap" to copy,
    • cheap means that the complexity of copying a view is independent of the number of elements in the range taken/produced by the view
    • some views do own their input when this condition can be satisfied to make them more convenient to use (e.g. view::single(value) owns by default a copy of value which can be avoided by using view::single(ref(value))),
  • stateful: views often store some state that might be mutated during iteration,
    • that is, making a view const might prevent iteration,
    • all views provide O(1) amortized begin and end. Some views must pre-compute begin and end on construction to provide this guarantee. To benefit from amortized O(1) begin and end, prefer storing these views and keeping them around instead of constructing them anew over and over.
    • However! Iterating more than once over a SinglePass view is undefined behavior. Use the !SinglePass<range> concept to constrain algorithms that iterate more than once over a range.
  • thread-safe: in the standard library sense, that is, const operations are thread safe:
    • However! often views do not have const iteration methods. One can:
    • (if possible) use view::const to ensure that the input of the range algorithm is not mutated and copy the view between threads,
    • use external synchronization (e.g. std::mutex) to share a mutable view between threads.

Given the current situation, maybe it is possible to implement a "view::synchronized" that either uses view::const (when that is sufficient), or protects the mutable state with a std::mutex to provide thread-safe const methods to whatever view it is composed with. That, or something like that, would make teaching the thread-safety part of the view story a bit easier (want to share a view between threads? Just use | view::synchronized).

It would also improve teachability if we had a more solid story for the undefined behavior behind SinglePass ranges (e.g. by requiring a run-time diagnostic in case one is iterated multiple times). I've run into them a couple of times and they were always painful to debug.

h-2 commented 7 years ago

Chiming in on this discussion after #617 and #618:

I expect more people to run into this since const iteration is really standard, especially since day-1 teaching in C++ tells everyone to pass around const & of their objects and there are reasons to maintain this for generic algorithms that just don't know (or care) if what they are getting owns data or just presents a view onto the data (so long as it's input-iterable for example).

Documenting which views aren't const-iterable would probably be an important first step, but maybe having something like ranges::any_const_view and ranges::any_const_random_access_view that just support type erasure from const-iterable views would also help get things right for people. What do you think?

The current model is: either the range has visible mutable state and does not have const begin()/end(), or else it guards any internal mutation so as to be thread-safe. I agree it's not ideal, but I haven't come up with anything better.

Some views simply cannot be const, logical or physical; an istream view, for instance. Unless you want to disallow such a view (I don't), it seems to me that the problem must be addressed.

INHO it's a very good policy to offer const-iterators whenever possible and using mutable state with atomic or other protection :+1: It's what most people would expect, I think. The other ones would just need some form of documentation...

PS: Are you interested in PRs on documentation things like that or do you consider them more work than benefit right now?

gnzlbg commented 7 years ago

The root of the problem is that newcomers try to make views consts or try to pass them by const lvalue reference because they have an incomplete mental model of how views work.

If you tell them that a view is like a state machine (or a coroutine, or a lazy stateful algorithm, ...) that generates values on the fly by mutating its internal state, the problem becomes obvious: if you make it const, it cannot mutate its internal state, so it cannot "generate" new values, which is what the compilation error is trying to tell them in its own peculiar way. What they probably want is to either not make the view const, or make a view with a const reference type to some data.

I think that once they understand this, other FAQs that newcomers have like why can't they call begin on a const view also become easier to explain.

I think that the docs should aim to convey a correct mental model when views are introduced. Doing so without drowning them in detail is hard, but I think this would deliver the largest win, documentation-wise.

Documenting which views aren't const-iterable would probably be an important first step,

While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.

I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.

INHO it's a very good policy to offer const-iterators whenever possible and using mutable state with atomic or other protection

Why? I've yet to see a case (and I'm sure @CaseyCarter will point one out) in which you actually want a const_iterator because an iterator with a const reference type does not suffice. This is also nothing new: most/all of the ordered/unordered STL containers have an iterator with a const reference type, and define using const_iterator = iterator; (at least libc++ does this).

Also, atomics and locks are not zero-cost, so they aren't really an option. This doesn't mean that we shouldn't provide a way to make views thread safe, but rather, that doing so should be opt-in (and in general, just putting the view behind a mutex somewhere solved the problem, for all views).

h-2 commented 7 years ago

The root of the problem is that newcomers try to make views consts or try to pass them by const lvalue reference because they have an incomplete mental model of how views work.

Well, that is entirely a matter of perspective I would say :smile: One could also ask, why views should behave so differently. For some views this might be a requirement, but for many it isn't.

if you make it const, it cannot mutate its internal state, so it cannot "generate" new values,

Why not? The internal state can be mutable, often it doesn't even effect the generated elements. IMHO abstracting this state away is a major selling point for views. "It might have some funky stuff going on internally, but all I see is a bunch of elements I can iterate over." ← this breaks if I can't do the latter in situations where I expect to be able to.

While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.

please don't :pray:

I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.

That depends on the application design, if you work with concepts and these require that it's const iterable, you will immediately get a (more or less) readable notification.

Why? I've yet to see a case (and I'm sure @CaseyCarter will point one out) in which you actually want a const_iterator because an iterator with a const reference type does not suffice.

Generic programming. Take this (silly) example:

template <typename type>
    requires (bool)ranges::InputRange<type const>()
auto generic_max(type const & input)
{
    auto ret = *begin(input);

    for (auto const & v : input)
        if (v > ret)
            ret = v;

    return ret;
}

int main()
{
    std::vector<int> foo{1, 4, 5, 7567546, 234, 56, 34534};
    auto bar = foo | ranges::view::slice(5, 7);
    auto bax = bar | ranges::view::reverse;

    std::cout << generic_max(foo) << '\n';
    std::cout << generic_max(bar) << '\n'; // it all works just the same
    std::cout << generic_max(bax) << '\n';
}

Sure, it is possible to work around the const &, use overloading or SFINAE et cetera, but it really takes a way the beauty of generic programming. If containers satisfy the range concepts, I should be able to work on that interface and not worry about ownership or "shallow copies VS const &"... I know it isn't always possible to avoid this, but for many cases it is, like the case above, and in those cases I think the implementation should offer this as a documented and stable feature.

gnzlbg commented 7 years ago

Generic programming. Take this (silly) example [...] Sure, it is possible to work around the const &, use overloading or SFINAE et cetera,

It is not a silly example. It is exactly the kind of code that some newcomers write. The docs should explicitly address it.

Sure, it is possible to work around the const &,

The correct solution is to just use Rng&&.

Can you explain why Rng const& is better/more generic than Rng&&? (Or why didn't you wrote Rng&& in the first place?)

pfultz2 commented 7 years ago

Can you explain why Rng const& is better/more generic than Rng&&?

const Rng& clearly expresses in the interface that it doesn't modify the range/container nor its elements.

pfultz2 commented 7 years ago

Of course, it doesn't make sense to use const on single pass ranges since there can be observable state changes(whether in the range itself or in some global or system state).

h-2 commented 7 years ago

The correct solution is to just use Rng&&.

const Rng& clearly expresses in the interface that it doesn't modify the range/container nor its elements.

I agree. [1] I know I can use the forwarding reference to bind to const types, but I now need to add extra unit tests that explicitly test for const container input and non-const container input to trigger this. Also the interface suggests to most readers / other developers that I explicitly chose not to use const & because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this). It also promotes other notions, like the function not being safe to call from multiple threads on the same data. Which indeed it wouldn't be now, since said data could now be a range whose thread-safety measures would not kick in, because it is accessed through a non-const-iterator interface.

I know these are things that can be worked around with different programming styles and lots of documentation, but I would still like to make a point for having const iterator interfaces where possible.

[1] Off-topic: Although, as @ericniebler pointed out, const Rng& wouldn't actually provide any guarantees on protecting the elements, because the const-ness doesn't propagate. I think there are valid reasons for modelling ranges like this, but I don't think that it is self-explanatory. There are other cases in C++ where constness does propagate, e.g. a const vector of vector doesn't only prevent adding more members, it also protects its members. This of course also has (good) reasons, I am just saying it is not self-explanatory, especially since views are considered a "non-owning collection" and by inference akin to containers by many.

CaseyCarter commented 7 years ago

There's a lot of wrong-thinking about views here: people need to stop thinking of views as being like references to containers and start thinking of views as being like iterators that denote zero or more elements instead of zero or one. Accepting a view by const& has the same semantics as accepting an iterator by const&: it has no effect on the mutability of the denoted elements, it only forbids modifying the iterator/view such that it denotes different elements. Algorithms that take views and/or iterators typically take them by value.

Range algorithms accept ranges by forwarding reference because they can operate on both views and containers. const& would have no significance in the view case, so it would at best create a false sense of security in users if mutating algorithms were to accept ranges by const&. If you are paranoid and want to guarantee that a range algorithm doesn't mutate elements you pass it view::const_(foo) instead of foo.

Documenting which views aren't const-iterable would probably be an important first step, While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.

I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.

We could theoretically document which view adaptors are never const iteratable and which views are const iteratable when they adapt const iterable underlying views, but I don't think it would be helpful to do so. Either you write generic code that works with both kinds of views, or you don't. Suggesting that users should memorize which views fall into which subset would be pointless. Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.

It would certainly be nice to abandon the complicated view adaptor implementation style that maintains const-ness of the underlying view:

CONCEPT_REQUIRES(SizedRange<Base const>())
range_size_t<Base> size() const
{ return ranges::size(base_); }

CONCEPT_REQUIRES(SizedRange<Base>() && !SizedRange<Base const>())
range_size_t<Base> size()
{ return ranges::size(base_); }

for simply:

CONCEPT_REQUIRES(SizedRange<Base>())
range_size_t<Base> size()
{ return ranges::size(base_); }

(Note that few of the view adaptors have all the cruft to "properly" maintain constness because it's a PITA.)

having something like ranges::any_const_view and ranges::any_const_random_access_view that just support type erasure from const-iterable views would also help get things right for people

I think this would only create another complicated choice point that makes it harder to use type-erased views.

I know I can use the forwarding reference to bind to const types, but I now need to add extra unit tests that explicitly test for const container input and non-const container input to trigger this. Also the interface suggests to most readers / other developers that I explicitly chose not to use const & because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this).

Writing generic functions that accept ranges as single objects is not like writing generic functions that take containers by reference, it is by design exactly like writing generic functions that accept ranges as iterator pairs.

pfultz2 commented 7 years ago

start thinking of views as being like iterators

If views are to work like iterators then they should extend iterators and not ranges.

We could theoretically document which view adaptors are never const iteratable and which views are const iteratable

Any single pass range(ie InputRange only) should only be non-const iterable. However, views which are at least ForwardRange should have the ability to restrict observable state(ie be taken as a const Range&), if that is not possible due to internal mutation then it should be declared as an InputRange, to make it clear that it is a single pass range.

ericniebler commented 7 years ago

Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.

I wasn't joking when I said that. It's still an attractive option.

CaseyCarter commented 7 years ago

If views are to work like iterators then they should extend iterators and not ranges.

Views get their similarity to iterators from ranges. C++98 [lib.iterator.requirements]/7:

Most of the library's algorithmic templates that operate on data structures have interfaces that use ranges. A range is a pair of iterators that designate the beginning and end of the computation.

Don't be fooled by the fact that containers implicitly denote a range: ranges are not like containers; ranges are like iterator/sentinel pairs.

I wasn't joking when I said that. It's still an attractive option.

The discussion in this thread has moved me completely from the "all views should be const iterable and internally synchronize mutating state" camp into the "all views should NOT be const iterable and threads should share copies of views instead of references to ranges" camp. Congratulations to you and @gnzlbg for beating me there.

gnzlbg commented 7 years ago

@pfultz2

const Rng& clearly expresses in the interface that it doesn't modify the range/container nor its elements.

No it does not. The only thing Rng const& rng denotes, is that rng is not being changed, but it doesn't denote anything about the elements of rng.

For example:

template <typename R> void foo(R& rng) { /*....*/ }
std::set<int> s = /*...*/;
foo(s); 

will fail to compile if foo tries to modify the range when foo(s) is instantiated, even though foo does not take a const reference, and the set is not const.

Also, the following is perfectly valid C++:

template <typename R>
void foo(R const& r) {
    *(r.begin()) = 3;
 }
std::vector<int> v = {1,2,3};
struct View { 
    std::vector<int>* v; 
    int* begin() const { return v->data(); } 
    int* end() const { return v->data() + v->size(); } 
};
foo(View{&v});
assert(v[0] == 3);

Here foo takes a const reference, yet it still can modify the elements of the underlying range. Why? Because making a view const does not mean that the elements the view points to are immutable, in the same sense as making a view non-const does not mean that you can mutate elements through it. @ericniebler showed an example about pointers above, this is exactly the same behavior that C++ pointers do have, nothing more, nothing less (and yes, I think this is confusing).

IMO the root of the issue, is that newcomers seem to confuse these concepts, but this is the way C++ works. When working with views, making them const does not mean that you are preventing accidental mutability of the elements, but rather that you are preventing the view from doing its work.

One cannot blame @h-2 for being confused, C++ is indeed confusing here, hence why we need to address this explicitly in the docs. Take a look at what @h-2 writes:

Also the interface suggests to most readers / other developers that I explicitly chose not to use const & because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this).

This is what the function signature looks like to a newcomer but this is not what the function signature actually does. The same applies to the thread safety issue.

gnzlbg commented 7 years ago

@ericniebler

I wasn't joking when I said that. It's still an attractive option.

One of the main advantages of views is that they are easily composable. Some of these compositions will result in "non-const-iterable views".

While I think "its nice" that some views can be "const-iterable", code that relies on this is brittle, and prevents composition and refactorings (or make them unnecessarily complicated just for the sake of marking some function argument const...).

I also don't think that I've ever written code that intentionally relies on a view being const iterable, but have done so by accident and have had to fix it later.

Maybe removing "const iterable" views is actually the right thing to do. Every user is going to clash against this at some point, for the same reasons @h-2 is having trouble right now. I think it would simplify things if we would put this cost upfront.

pfultz2 commented 7 years ago

Here foo takes a const reference, yet it still can modify the elements of the underlying range. Why?

Yes, of course, you can also create the variable as mutable. It doesn't mean its a good idea. In general, one expects to be able to write:

template<class T>
void foo(const T&);

std::set<int> s = /*...*/;
foo(s);

And s should be essentially unchanged. However, we can't guarantee this with single pass ranges such as view::join or istream_range, but for any range that is a ForwardRange this should be possible.

Having views which have internal mutability can happen, but they should be declared as InputRange. Then there is no need to write things like a mutable size() function, which makes it look like we are building ranges around Heisenberg's uncertainty principle.

ericniebler commented 7 years ago

Folks who think that there should be a way to add const to a generic Range function parameter and have it mean deep const (pertaining to the elements) should read this.

gnzlbg commented 7 years ago

@pfultz2

Yes, of course, you can also create the variable as mutable. It doesn't mean its a good idea.

I did not need to write mutable or use const_cast or anything to run into this "issue" because this is intrinsic to how the C++ type-system works (some would call this a "feature").

but for any range that is a ForwardRange this should be possible.

Having views which have internal mutability can happen, but they should be declared as InputRange.

There are lots of views with internal mutability. Doing what you propose would require adding internal synchronization to all of these views, because passing them as a const reference between threads allows modifying shared state (under the current model, you get a nice compilation error).

That is, something as simple as view::iota would need to use atomics (for integers), possibly a mutex for RandomAccessIncrementables in general, and also be an InputRange instead of a RandomAccessRange... I think this is a very high price to pay just so that one can pass iota by const reference...

Still anybody can use view_facade to write their own synchronized_iota view and send a PR if they so desire. Or even better, write a view::synchronized adaptor that stores the range behind a mutex and is const iterable. But given that nobody has bothered to even open an issue about this, it doesn't seem to be a problem that occurs often enough in practice to be worth solving.

pfultz2 commented 7 years ago

There are lots of views with internal mutability. Doing what you propose would require adding internal synchronization to all of these views

That is not what I am proposing. Rather the views should be declared as InputRange.

That is, something as simple as view::iota would need to use atomics (for integers), possibly a mutex for RandomAccessIncrementables in general

That is because you created view::iota to work as an iterator instead of a range. As such, it should be declared as an InputRange to make it clear that it is single pass, many of the views already do this, such as view::join.

CaseyCarter commented 7 years ago

I think that demoting views with internal cached state to input would produce a less useful library. Also, we need to stop using view::iota as an example: iota_view is always const-iterable. view::reverse is a better example: it is multipass but not const-iterable due to the need to cache the underlying range's end iterator to return (adapted) from begin. No, it cannot cache at construction time since construction - and more importantly copy - would then not be O(1).

pfultz2 commented 7 years ago

view::reverse is a better example: it is multipass but not const-iterable due to the need to cache the underlying range's end iterator to return (adapted) from begin

Since ranges support proxies, you could return a proxy to avoid LWG#198 instead.

FloopCZ commented 7 years ago

I expect more people to run into this since const iteration is really standard, especially since day-1 teaching in C++ tells everyone to pass around const & of their objects and there are reasons to maintain this for generic algorithms that just don't know (or care) if what they are getting owns data or just presents a view onto the data (so long as it's input-iterable for example).

@h-2 I don't think it will be a problem for C++ programmers. Everyone knows (or should know) that a const vector::iterator does not make much sense and that it is not the same as vector::const_iterator. The same stands for views. const view::foo does not make much sense and it is not the same as view::const_ | view:foo.

While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.

@ericniebler Please do! :-)

The discussion in this thread has moved me completely from the "all views should be const iterable and internally synchronize mutating state" camp into the "all views should NOT be const iterable and threads should share copies of views instead of references to ranges" camp.

@CaseyCarter I feel exactly the same.

That is not what I am proposing. Rather the views should be declared as InputRange.

@pfultz2 If all views were InputRanges, the library would loose much of its usability. For instance, now, you can easily shuffle elements on even indices:

vector<int> nums{1, 2, 3, 4, 5};
nums | view::stride(2) | action::shuffle(gen);

If view::stride was an input range, this would not be possible, since shuffle requires random access range (for obvious reasons).

h-2 commented 7 years ago

I have to admit, I am a little biased, because most of our use-cases deal with views that are random-access (especially transform and slice) / on random-access containers. For this class of views cheap const iteration is easily available. I suspect that there is huge demand just for this sub-group of views that would warrent giving them extra treatment. But I also understand the value of a consistent library design.

Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.

I still think this is not the best way to go, because it will make life harder for people who mainly work with views that could cheaply support const iteration as mentioned above, and I think it will also reduce the number of (early) adopters of the library, because it forces people to make changes throughout their code-base. If, however, there is no plan to make const iteration a real feature (as in documented and usable without hacks¹), then I agree, the best solution is probably to get rid of it altogether and just setting up a big warning sign so that people know it's not plug'n'play with existing interfaces and best-practices. And probably adding a bunch of static_asserts that redirect people to this warning sign when they try to use interfaces they shouldn't.

¹ I personally think that usability-wise it wouldn't be more complicated than documenting/memorizing which views are random-access, and which aren't, but that's just my opinion [and I know it is more work for the people coding the library!].

gnzlbg commented 7 years ago

@ericniebler

The cache variables are guarded by atomics. The alternative would be to make begin()/end() non-const. I considered the atomic to be the lesser evil, but I am open to other views (har har).

Three thoughts, the atomics:

I really don't think that making begin/end const is worth it. Users need to learn sooner rather than later that making lazy stateful algorithms const does not allow them to do anything useful with them.

ericniebler commented 7 years ago

Users need to learn sooner rather than later that making lazy stateful algorithms const does not allow them to do anything useful with them.

Two things:

1) I wholeheartedly agree. It looks like the stride view is the only remaining view that makes any use of atomics. Pull requests welcome.

2) I've been kicking around an idea where views with mutable state are const-iterable but their category degrades to Input. I am at once horrified and fascinated by the prospect.

gnzlbg commented 7 years ago

but their category degrades to Input

Why? To avoid having amortized O(1) begin / end?

ericniebler commented 7 years ago

Given the semantics of input iterators, no correct program can detect that the underlying range is mutating in a non-const-correct way.

gnzlbg commented 7 years ago

So I assume that incorrect programs can detect it, do you have an example of "how incorrect" would a program need to be to detect it? Is it easy or hard? I'd like incorrect programs to either fail to compile or produce a run-time error in those cases.