Closed CaseyCarter closed 5 years ago
I see a potential for latent bugs where a programmer accustomed to the fact that calling begin/end on mutable containers is threadsafe calls begin/end on mutable ranges without realizing there are sharp corners here.
I agree with the tone of this RFC, these are real issues.
We (I) need to investigate an alternative model where view operations are always const and perform internal synchronization if needed.
I think that views are already expensive, and that the serial case is an important one, so my feelings are currently against a model that requires internal synchronization to implement a conforming view. Still, this hasn't been investigated yet and maybe (hopefully?) isn't as bad as I expect it to be.
There is another issue. While views are not thread-safe in general, they are (or should be) cheap to copy, so the thread-safety issues that might arise are from users inadvertently sharing views between threads instead of copying them (which is what users should be doing) [*] . This is another point in which views differ from containers (which are expensive to copy and might be worth sharing). Views should almost never be worth sharing and by making them thread-safe we might be encouraging the wrong usage patterns.
Sometimes one might really want to share a view, but in those cases one wants to perform sharing explicitly as opposed to by mistake, so I think thread-safe versions of some of the views (something that you are not proposing) could be really useful even though having the world split might impact re-usability (e.g. can't reuse view::a | view::b | view::c
because view::b
is/isn't thread-safe).
Is there a list somewhere of which views are not thread safe? (e.g. probably all the SinglePass
views but maybe others as well).
[*] Maybe the cppguidelines could in the future improve the safety issues by allowing marking some views as not-thread safe in the spirit of rust Send<T>
and Sync<T>
traits but that's a long shot.
I quickly searched the code and I discovered that box.hpp and stride.hpp already use atomics for some synchronization. I find that surprising. Could anyone comment what is rationale behind that? Is not we paying for what we don't normally use?
PS seems like this is for some mutable members which, where the user would assume thread-safety because of const interface.
I'm concerned that the differing const distinctions for containers and views don't mesh well into a consistent notion of what const means for operations on general ranges. I see a potential for latent bugs where a programmer accustomed to the fact that calling begin/end on mutable containers is threadsafe calls begin/end on mutable ranges without realizing there are sharp corners here.
The current model is: either the range has visible mutable state and does not have const
begin()
/end()
, or else it guards any internal mutation so as to be thread-safe. I agree it's not ideal, but I haven't come up with anything better.
I quickly searched the code and I discovered that box.hpp and stride.hpp already use atomics for some synchronization.
It's so that view::stride
can have an amortized O(1) begin()
/end()
by caching some data rather than recomputing it every time. The cache variables are guarded by atomics. The alternative would be to make begin()
/end()
non-const
. I considered the atomic to be the lesser evil, but I am open to other views (har har).
The only mutating operation on pointers is assignment. If views are supposed to be range-pointers, perhaps assignment should be the only mutating operation? We (I) need to investigate an alternative model where view operations are always const and perform internal synchronization if needed.
Ack. Baby, meet bathwater. Some views simply cannot be const, logical or physical; an istream
view, for instance. Unless you want to disallow such a view (I don't), it seems to me that the problem must be addressed.
One simplification is that no views have const
begin()
/end()
members. That way nobody has to think about it. Seems like a fool's consistency to me, though.
This implementation of istream_range
:
template<typename Val>
struct istream_range
: view_facade<istream_range<Val>, unknown>
{
private:
friend range_access;
std::istream *sin_;
mutable semiregular_t<Val> obj_;
struct cursor
{
private:
istream_range const *rng_;
public:
cursor() = default;
explicit cursor(istream_range const &rng)
: rng_(&rng)
{}
void next()
{
rng_->next();
}
Val &get() const noexcept
{
return rng_->cached();
}
bool done() const
{
return !*rng_->sin_;
}
Val && move() const noexcept
{
return detail::move(rng_->cached());
}
};
void next() const
{
*sin_ >> cached();
}
cursor begin_cursor() const
{
return cursor{*this};
}
Val & cached() const noexcept
{
return obj_;
}
istream_range(std::istream &sin, Val *)
: sin_(&sin), obj_{}
{}
istream_range(std::istream &sin, semiregular<Val> *)
: sin_(&sin), obj_{in_place}
{}
public:
istream_range() = default;
istream_range(std::istream &sin)
: istream_range(sin, _nullptr_v<semiregular_t<Val>>())
{
next(); // prime the pump
}
};
has const
begin
& end
, and meets the thread-safety requirements of [res.on.data.races]. Note that cursor::next
cannot race with get
/done
/move
: input iterators are not required to be dereferenceable after increment, so a program in which two threads simultaneously increment and dereference input iterators with the same value has undefined behavior.
That's interesting, but it feels a bit dirty. I don't like const
members that mutate the object in a visible way, especially if it's unsynchronized. I'm not particularly swayed by the argument that we've defined things such as to permit it.
I disagree with the claim that there are "
const
members that mutate the object in a visible way." No such mutations are observable in a program with defined behavior. If that guarantee is strong enough for the C++ memory model, surely it suffices for istream_range
.
Do you disagree with the semantics we've defined for views and input iterators that make this implementation valid, or do you disagree with the results this implementation produces for programs with undefined behavior?
Do you disagree with the semantics we've defined for views and input iterators that make this implementation valid, or do you disagree with the results this implementation produces for programs with undefined behavior?
Neither. I'm expressing my distaste for using up all the wiggle room we've given ourselves wrt input ranges. I don't think I disagree that this is allowed. And it probably should be. But man does it feel wrong.
EDIT: I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code. This is not that.
For the record, this issue interests me greatly. The interaction of the type system with views is very important to get right. I would love to hear a better suggestion, but so far I favor the current design as warty and surprising as it can be. Perhaps we look for a library solution that guides people away from the warts and toward doing the right thing, like providing the const-correct begin
/end
overloads for them, depending on whether their range has to cache mutable state. ??? I'm spitballing here. Open to other suggestions.
EDIT: One of the reasons view_adaptor
exists, as crummy as it is, is because it makes it easier to get this const
business right.
I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code.
What exactly do you mean here? Iterators are only dereferenceable after increment if it < end()
right? So all iterators kind of have this problem. Maybe that if I have two copies of InputIterators and increment one the other is invalidated?.
If that is the case the problem is IMO that dereferencing doesn't mean the same thing for InputIterator than for other iterators, and it looks to me that we got ourselves into this hole by calling it the same thing. A solution could then be to call it something else .get()
and for example return an optional
.
This is far from perfect, breaks the iterator hierarchy, and optional
is not free, but it could make things safe by forcing the users to check if they got a value from an InputIterator or not.
EDIT: going this far is IMO not worth the trouble, and making InputIterators return a range is too weird.
Maybe that if I have two copies of InputIterators and increment one the other is invalidated?).
This.
If that is the case the problem is IMO that dereferencing doesn't mean the same thing for InputIterator than for other iterators, and it looks to me that we got ourselves into this hole by calling it the same thing.
If by "we" you mean the original designers of the STL then I agree. It's too late to change it now. The question is, what hand can we make of the cards we've been dealt? Casey thinks he can pull an inside straight (stretching the metaphor), I think we should settle for a std pair.
Dereference is fine, really, it's increment that causes the insanity. Increment has spooky action-at-a-distance in that it renders copies invalid. Such action-at-a-distance is anathema to our ability to reason locally about program behavior.
EDIT: I think it feels wrong because not everybody will be cognizant of the fact that input iterators are not required to be dereferenceable after increment, and the type system is giving them no help. A good interface makes it easy to write correct code and hard to write incorrect code. This is not that.
Yes, and that ship got on the bus and left the train station long ago. We can't bolt safety onto single-pass iterators now. What we can do is suggest implementation techniques that exploit the undefined behavior to provide at least some error detection. An input range that detects repeated calls to begin
, or late calls to size
, or use of invalid iterators adds more realistic safety than an input range that has mutable size
or begin
.
And yes, I admit I'm doing a lot more complaining about problems in this thread than discussing solutions. My goal was not to belittle the current use of const/mutable for views, so much as to point out some things I think need improvement and see of others agree.
I don't think const should affect the behaviour of views. If you want to cache state, using mutable is fine.
This issue took me unexpectedly and made me lose a lot of time. Had I known this beforehand, I might have stuck with boost range v2 instead.
If you want to cache state, using mutable is fine.
Possibly for Input ranges, but how could it be fine to mutate a const object that is possibly being read concurrently from different parts of your program? Guarding against that kinds of misuse is exactly what const
is for.
Had I known this beforehand, I might have stuck with boost range v2 instead.
Seriously? Be my guest.
Boost.Iterator/Boost.Range are fairly simple both in concept and implementation, are well documented, well understood, mature, and I happen to have a lot of experience with it. I chose to use Range v3 for a toy project since it was a chance to evaluate it, but given the minimal documentation, the amount of complexity I ran into, and the open major design issues like this one, it might appear that it's not quite ready. Now that is but an opinion, but it's based on somewhat unquestionable facts, so I don't think being so dismissive was justly deserved.
Anyway, I'm not quite used to the view/cursor dichotomy as opposed to the range/iterator one yet, but couldn't all state go into the iterator rather than the view, removing all needs for synchronization?
I think maybe the library needs a FAQ because @mgaunard questions are fair for new users to have, and come up very often. New users seem to have two issues with the library:
So because they don't know this, and some of them come with a Boost.Range v2 background, they first try to make everything const, and "clash" against the library. Then they ask themselves why can't I make a view const? It has no state! And because they don't know the second point, they end up very confused.
@mgaunard
view
s are stateful algorithms by definition
This is how range-v3 is designed. While some views don't store state, most do. Don't make them const. This might be hard if you are used to "make everything const", but ask yourself, does this stateful algorithm really need to be const here?
In range-v3,
begin
/end
are amortized O(1)
That is, if you do:
auto v = range | view::remove_if(pred);
for (auto&& i : v) { ... } // begin/end are amortized O(1) here
for (auto&& i : v) { ... } // begin/end are amortized O(1) here as well
The way this is implemented is that some views cache the begin/end on construction, so that all the calls to begin/end from that point on are O(1). This requires state, so vies that do this cannot be const (@ericniebler: I don't find this in the design paper, have you written about this somewhere besides in the issues? I guess this also would belong to a FAQ).
@mgaunard This is just a guarantee that range-v3 views offer you. If you don't need this/want this, range-v3 offers you the tools to build your own view::remove_if_mgaunard
that does not cache any internal state, that offers O(N)
begin and end, and can be made const.
But think about it thrice before going down that route. What do you win? The ability to make a view const? For what? Is that worth it? What do you loose? (O(1) begin/end)
From @CaseyCarter in #428 :
remove_if caches the value of the first "non-removed" iterator in the range so as to amortize the cost of multiple calls to begin since begin is supposed to be O(1). This is odd given that next and prev are not O(1) and more likely to be called frequently and repeatedly
The difference between begin
and next
is that begin
is assumed to O(1) is so many places. Consider this code from reverse_view
's next
member:
if(0 != ranges::advance(it, -1, ranges::begin(rng_->mutable_base())))
it = rng_->get_end_(BoundedRange<Rng>());
If begin
is O(M), then next
becomes O(M). That's bad.
For ranges like filter_view
, next
is permitted to be O(M) so long as traversing the entire range is O(N). next
is more expensive, but it must be called fewer times to traverse the entire range.
begin
and end
get called frequently in the implementation of next
and prev
of many other range adaptors. The alternative is to have the adaptor's iterators cache the underlying range's begin
and end
iterator. That makes iterators fat -- a problem that compounds as adaptor chains grow -- and spreads the cost out as begin
and end
get recomputed over and over.
The design compromise that is currently implemented makes begin
/end
always amortized O(1) so it is safe to be called from a loop where it is used in expressions that test for begin- or end-of-range. The win is that iterators stay small and the beginning and end of the range, if it is computed, gets computed only once, and that result can be reused over and over. The downside, as you've noticed, is that ranges become stateful.
I'm sure you can find examples in the other views where next
and prev
get called in loops in a way that make the algorithms O(N*M). Caching all the positions returned by next
and prev
is obviously unfeasible. I made begin
and end
different (a) because they so frequently appear in end-of-range checking expressions, and (b) there are only two such positions, so caching them is feasible.
So, here are the alternatives I see:
begin
/end
members and are permitted to cache data from begin
/end
without synchronization overhead.begin
/end
in non-Input ranges is verboten. Ranges like filter_view
can either satisfy Forward or have O(1) begin
/end
members, but not both. Caching Input ranges have const begin
/end
members, increasing the likelihood they will be used unsafely.begin
/end
, and have const begin
/end
must guard the mutation with synchronization.begin
/end
are O(N). Cursors that compare an iterator with the beginning or end of a range must store those positions internally. Iterator bloat becomes a problem that compounds as adaptor chains grow.Any other options I missed?
<spitballing> I can imagine a separate range adaptor that caches the underlying range's begin/end on construction. With C++17's guarantees about copy elision, we can investigate the possibility of using this either internally or letting users use it directly. That requires that let filter_view
have O(N) begin
. That doesn't give our users much guidance to know when such a caching adaptor would be needed.
The difference between begin and next is that begin is assumed to O(1) is so many places. Consider this code from reverse_view's next member:
Why wouldn't you just do --it
? In the deference oprerator, it would require one iteration back but that seems better than all that branching.
(Drifting off-topic):
Why wouldn't you just do
--it
? In the deference oprerator, it would require one iteration back but that seems better than all that branching.
This implementation of reverse_view avoids LWG#198.
Was this design aspect (constness drops range status of certain views) discussed in WG21 yet?
constness drops range status of certain view
With the current design, const does not effect the strength of the concept that a range satisfies. So no, it has not been discussed in WG21.
So, here are the alternatives I see: [...] Any other options I missed?
@CaseyCarter Thoughts? I'm going to start work on Range TS2 "Views Edition" soon/eventually, and as of right now, I haven't heard a better alternative than the current design. I can play with improved designs for range_adaptor
that do more of this work for the user, but the fundamental model would be unchanged.
I ran into this when trying to improve a bit the view
s part of the range-v3 documentation. Describing them as "lazy range combinators" doesn't give users enough information to avoid common questions and mistakes, so I was trying to come up with a slightly longer description that explains the semantics a bit more and covers some pitfalls.
Needless to say, I gave up since the best I could come up with was this:
Views are:
- range algorithms: from some input (which might be a range), they produce a range of values,
- composable: views producing and accepting ranges can be chained to construct more complex algorithms,
- lazy: the sequence of values produced by a view is computed on demand as the view is iterated over, every time the view is iterated over
view
s present the trade-off "compute every time, no extra memory", while the opposite tradeoff, "compute once, store in memory" is provided byaction
s. Both views and actions can be composed.- non-owning: in general,
view
s do not own their input, which makes them "cheap" to copy,
- cheap means that the complexity of copying a view is independent of the number of elements in the range taken/produced by the view
- some views do own their input when this condition can be satisfied to make them more convenient to use (e.g.
view::single(value)
owns by default a copy ofvalue
which can be avoided by usingview::single(ref(value))
),- stateful:
view
s often store some state that might be mutated during iteration,
- that is, making a
view
const
might prevent iteration,- all views provide O(1) amortized
begin
andend
. Some views must pre-computebegin
andend
on construction to provide this guarantee. To benefit from amortized O(1)begin
andend
, prefer storing these views and keeping them around instead of constructing them anew over and over.- However! Iterating more than once over a
SinglePass
view is undefined behavior. Use the!SinglePass<range>
concept to constrain algorithms that iterate more than once over a range.- thread-safe: in the standard library sense, that is,
const
operations are thread safe:
- However! often views do not have
const
iteration methods. One can:- (if possible) use
view::const
to ensure that the input of the range algorithm is not mutated and copy the view between threads,- use external synchronization (e.g.
std::mutex
) to share a mutableview
between threads.
Given the current situation, maybe it is possible to implement a "view::synchronized
" that either uses view::const
(when that is sufficient), or protects the mutable state with a std::mutex
to provide thread-safe const methods to whatever view it is composed with. That, or something like that, would make teaching the thread-safety part of the view story a bit easier (want to share a view between threads? Just use | view::synchronized
).
It would also improve teachability if we had a more solid story for the undefined behavior behind SinglePass
ranges (e.g. by requiring a run-time diagnostic in case one is iterated multiple times). I've run into them a couple of times and they were always painful to debug.
Chiming in on this discussion after #617 and #618:
I expect more people to run into this since const iteration is really standard, especially since day-1 teaching in C++ tells everyone to pass around const &
of their objects and there are reasons to maintain this for generic algorithms that just don't know (or care) if what they are getting owns data or just presents a view onto the data (so long as it's input-iterable for example).
Documenting which views aren't const-iterable would probably be an important first step, but maybe having something like ranges::any_const_view
and ranges::any_const_random_access_view
that just support type erasure from const-iterable views would also help get things right for people. What do you think?
The current model is: either the range has visible mutable state and does not have const begin()/end(), or else it guards any internal mutation so as to be thread-safe. I agree it's not ideal, but I haven't come up with anything better.
Some views simply cannot be const, logical or physical; an istream view, for instance. Unless you want to disallow such a view (I don't), it seems to me that the problem must be addressed.
INHO it's a very good policy to offer const-iterators whenever possible and using mutable state with atomic or other protection :+1: It's what most people would expect, I think. The other ones would just need some form of documentation...
PS: Are you interested in PRs on documentation things like that or do you consider them more work than benefit right now?
The root of the problem is that newcomers try to make views const
s or try to pass them by const lvalue reference because they have an incomplete mental model of how views work.
If you tell them that a view is like a state machine (or a coroutine, or a lazy stateful algorithm, ...) that generates values on the fly by mutating its internal state, the problem becomes obvious: if you make it const, it cannot mutate its internal state, so it cannot "generate" new values, which is what the compilation error is trying to tell them in its own peculiar way. What they probably want is to either not make the view const, or make a view with a const reference type to some data.
I think that once they understand this, other FAQs that newcomers have like why can't they call begin
on a const view
also become easier to explain.
I think that the docs should aim to convey a correct mental model when views are introduced. Doing so without drowning them in detail is hard, but I think this would deliver the largest win, documentation-wise.
Documenting which views aren't const-iterable would probably be an important first step,
While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.
I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.
INHO it's a very good policy to offer const-iterators whenever possible and using mutable state with atomic or other protection
Why? I've yet to see a case (and I'm sure @CaseyCarter will point one out) in which you actually want a const_iterator
because an iterator
with a const reference type does not suffice. This is also nothing new: most/all of the ordered/unordered STL containers have an iterator with a const reference type, and define using const_iterator = iterator;
(at least libc++ does this).
Also, atomics and locks are not zero-cost, so they aren't really an option. This doesn't mean that we shouldn't provide a way to make views thread safe, but rather, that doing so should be opt-in (and in general, just putting the view behind a mutex somewhere solved the problem, for all views).
The root of the problem is that newcomers try to make views consts or try to pass them by const lvalue reference because they have an incomplete mental model of how views work.
Well, that is entirely a matter of perspective I would say :smile: One could also ask, why views should behave so differently. For some views this might be a requirement, but for many it isn't.
if you make it const, it cannot mutate its internal state, so it cannot "generate" new values,
Why not? The internal state can be mutable, often it doesn't even effect the generated elements. IMHO abstracting this state away is a major selling point for views. "It might have some funky stuff going on internally, but all I see is a bunch of elements I can iterate over." ← this breaks if I can't do the latter in situations where I expect to be able to.
While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.
please don't :pray:
I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.
That depends on the application design, if you work with concepts and these require that it's const iterable, you will immediately get a (more or less) readable notification.
Why? I've yet to see a case (and I'm sure @CaseyCarter will point one out) in which you actually want a const_iterator because an iterator with a const reference type does not suffice.
Generic programming. Take this (silly) example:
template <typename type>
requires (bool)ranges::InputRange<type const>()
auto generic_max(type const & input)
{
auto ret = *begin(input);
for (auto const & v : input)
if (v > ret)
ret = v;
return ret;
}
int main()
{
std::vector<int> foo{1, 4, 5, 7567546, 234, 56, 34534};
auto bar = foo | ranges::view::slice(5, 7);
auto bax = bar | ranges::view::reverse;
std::cout << generic_max(foo) << '\n';
std::cout << generic_max(bar) << '\n'; // it all works just the same
std::cout << generic_max(bax) << '\n';
}
Sure, it is possible to work around the const &
, use overloading or SFINAE et cetera, but it really takes a way the beauty of generic programming. If containers satisfy the range concepts, I should be able to work on that interface and not worry about ownership or "shallow copies VS const &
"...
I know it isn't always possible to avoid this, but for many cases it is, like the case above, and in those cases I think the implementation should offer this as a documented and stable feature.
Generic programming. Take this (silly) example [...] Sure, it is possible to work around the const &, use overloading or SFINAE et cetera,
It is not a silly example. It is exactly the kind of code that some newcomers write. The docs should explicitly address it.
Sure, it is possible to work around the const &,
The correct solution is to just use Rng&&
.
Can you explain why Rng const&
is better/more generic than Rng&&
? (Or why didn't you wrote Rng&&
in the first place?)
Can you explain why Rng const& is better/more generic than Rng&&?
const Rng&
clearly expresses in the interface that it doesn't modify the range/container nor its elements.
Of course, it doesn't make sense to use const
on single pass ranges since there can be observable state changes(whether in the range itself or in some global or system state).
The correct solution is to just use Rng&&.
const Rng&
clearly expresses in the interface that it doesn't modify the range/container nor its elements.
I agree. [1] I know I can use the forwarding reference to bind to const types, but I now need to add extra unit tests that explicitly test for const container input and non-const container input to trigger this. Also the interface suggests to most readers / other developers that I explicitly chose not to use const &
because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this).
It also promotes other notions, like the function not being safe to call from multiple threads on the same data. Which indeed it wouldn't be now, since said data could now be a range whose thread-safety measures would not kick in, because it is accessed through a non-const-iterator interface.
I know these are things that can be worked around with different programming styles and lots of documentation, but I would still like to make a point for having const iterator interfaces where possible.
[1] Off-topic: Although, as @ericniebler pointed out, const Rng&
wouldn't actually provide any guarantees on protecting the elements, because the const-ness doesn't propagate. I think there are valid reasons for modelling ranges like this, but I don't think that it is self-explanatory. There are other cases in C++ where constness does propagate, e.g. a const vector of vector doesn't only prevent adding more members, it also protects its members. This of course also has (good) reasons, I am just saying it is not self-explanatory, especially since views are considered a "non-owning collection" and by inference akin to containers by many.
There's a lot of wrong-thinking about views here: people need to stop thinking of views as being like references to containers and start thinking of views as being like iterators that denote zero or more elements instead of zero or one. Accepting a view by const&
has the same semantics as accepting an iterator by const&
: it has no effect on the mutability of the denoted elements, it only forbids modifying the iterator/view such that it denotes different elements. Algorithms that take views and/or iterators typically take them by value.
Range algorithms accept ranges by forwarding reference because they can operate on both views and containers. const&
would have no significance in the view case, so it would at best create a false sense of security in users if mutating algorithms were to accept ranges by const&
. If you are paranoid and want to guarantee that a range algorithm doesn't mutate elements you pass it view::const_(foo)
instead of foo
.
Documenting which views aren't const-iterable would probably be an important first step, While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.
I also don't think that this would help beginners, because relying on this is brittle. Suppose that you update some "const-iterable" view by piping it an adaptor that makes it "non-const-iterable". The error message one gets in every place that tries to use the view as const-iterable, would only make sense for those users that already know that they shouldn't be doing that in the first place.
We could theoretically document which view adaptors are never const iteratable and which views are const iteratable when they adapt const iterable underlying views, but I don't think it would be helpful to do so. Either you write generic code that works with both kinds of views, or you don't. Suggesting that users should memorize which views fall into which subset would be pointless. Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.
It would certainly be nice to abandon the complicated view adaptor implementation style that maintains const-ness of the underlying view:
CONCEPT_REQUIRES(SizedRange<Base const>())
range_size_t<Base> size() const
{ return ranges::size(base_); }
CONCEPT_REQUIRES(SizedRange<Base>() && !SizedRange<Base const>())
range_size_t<Base> size()
{ return ranges::size(base_); }
for simply:
CONCEPT_REQUIRES(SizedRange<Base>())
range_size_t<Base> size()
{ return ranges::size(base_); }
(Note that few of the view adaptors have all the cruft to "properly" maintain constness because it's a PITA.)
having something like
ranges::any_const_view
andranges::any_const_random_access_view
that just support type erasure from const-iterable views would also help get things right for people
I think this would only create another complicated choice point that makes it harder to use type-erased views.
I know I can use the forwarding reference to bind to const types, but I now need to add extra unit tests that explicitly test for const container input and non-const container input to trigger this. Also the interface suggests to most readers / other developers that I explicitly chose not to use const & because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this).
Writing generic functions that accept ranges as single objects is not like writing generic functions that take containers by reference, it is by design exactly like writing generic functions that accept ranges as iterator pairs.
start thinking of views as being like iterators
If views are to work like iterators then they should extend iterators and not ranges.
We could theoretically document which view adaptors are never const iteratable and which views are const iteratable
Any single pass range(ie InputRange only) should only be non-const iterable. However, views which are at least ForwardRange
should have the ability to restrict observable state(ie be taken as a const Range&
), if that is not possible due to internal mutation then it should be declared as an InputRange
, to make it clear that it is a single pass range.
Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.
I wasn't joking when I said that. It's still an attractive option.
If views are to work like iterators then they should extend iterators and not ranges.
Views get their similarity to iterators from ranges. C++98 [lib.iterator.requirements]/7:
Most of the library's algorithmic templates that operate on data structures have interfaces that use ranges. A range is a pair of iterators that designate the beginning and end of the computation.
Don't be fooled by the fact that containers implicitly denote a range: ranges are not like containers; ranges are like iterator/sentinel pairs.
I wasn't joking when I said that. It's still an attractive option.
The discussion in this thread has moved me completely from the "all views should be const iterable and internally synchronize mutating state" camp into the "all views should NOT be const iterable and threads should share copies of views instead of references to ranges" camp. Congratulations to you and @gnzlbg for beating me there.
@pfultz2
const Rng& clearly expresses in the interface that it doesn't modify the range/container nor its elements.
No it does not. The only thing Rng const& rng
denotes, is that rng
is not being changed, but it doesn't denote anything about the elements of rng
.
For example:
template <typename R> void foo(R& rng) { /*....*/ }
std::set<int> s = /*...*/;
foo(s);
will fail to compile if foo
tries to modify the range when foo(s)
is instantiated, even though foo does not take a const reference, and the set is not const.
Also, the following is perfectly valid C++:
template <typename R>
void foo(R const& r) {
*(r.begin()) = 3;
}
std::vector<int> v = {1,2,3};
struct View {
std::vector<int>* v;
int* begin() const { return v->data(); }
int* end() const { return v->data() + v->size(); }
};
foo(View{&v});
assert(v[0] == 3);
Here foo
takes a const reference, yet it still can modify the elements of the underlying range. Why? Because making a view
const does not mean that the elements the view points to are immutable, in the same sense as making a view
non-const does not mean that you can mutate elements through it. @ericniebler showed an example about pointers above, this is exactly the same behavior that C++ pointers do have, nothing more, nothing less (and yes, I think this is confusing).
IMO the root of the issue, is that newcomers seem to confuse these concepts, but this is the way C++ works. When working with views, making them const does not mean that you are preventing accidental mutability of the elements, but rather that you are preventing the view from doing its work.
One cannot blame @h-2 for being confused, C++ is indeed confusing here, hence why we need to address this explicitly in the docs. Take a look at what @h-2 writes:
Also the interface suggests to most readers / other developers that I explicitly chose not to use const & because the function changes it's arguments (we have policies for distinguishing in-parameters and in-out parameters based on this).
This is what the function signature looks like to a newcomer but this is not what the function signature actually does. The same applies to the thread safety issue.
@ericniebler
I wasn't joking when I said that. It's still an attractive option.
One of the main advantages of views is that they are easily composable. Some of these compositions will result in "non-const-iterable views".
While I think "its nice" that some views can be "const-iterable", code that relies on this is brittle, and prevents composition and refactorings (or make them unnecessarily complicated just for the sake of marking some function argument const...).
I also don't think that I've ever written code that intentionally relies on a view being const iterable, but have done so by accident and have had to fix it later.
Maybe removing "const iterable" views is actually the right thing to do. Every user is going to clash against this at some point, for the same reasons @h-2 is having trouble right now. I think it would simplify things if we would put this cost upfront.
Here foo takes a const reference, yet it still can modify the elements of the underlying range. Why?
Yes, of course, you can also create the variable as mutable
. It doesn't mean its a good idea. In general, one expects to be able to write:
template<class T>
void foo(const T&);
std::set<int> s = /*...*/;
foo(s);
And s
should be essentially unchanged. However, we can't guarantee this with single pass ranges such as view::join
or istream_range
, but for any range that is a ForwardRange
this should be possible.
Having views which have internal mutability can happen, but they should be declared as InputRange
. Then there is no need to write things like a mutable size()
function, which makes it look like we are building ranges around Heisenberg's uncertainty principle.
Folks who think that there should be a way to add const
to a generic Range
function parameter and have it mean deep const
(pertaining to the elements) should read this.
@pfultz2
Yes, of course, you can also create the variable as mutable. It doesn't mean its a good idea.
I did not need to write mutable
or use const_cast
or anything to run into this "issue" because this is intrinsic to how the C++ type-system works (some would call this a "feature").
but for any range that is a ForwardRange this should be possible.
Having views which have internal mutability can happen, but they should be declared as InputRange.
There are lots of views with internal mutability. Doing what you propose would require adding internal synchronization to all of these views, because passing them as a const reference between threads allows modifying shared state (under the current model, you get a nice compilation error).
That is, something as simple as view::iota
would need to use atomics (for integers), possibly a mutex for RandomAccessIncrementable
s in general, and also be an InputRange
instead of a RandomAccessRange
... I think this is a very high price to pay just so that one can pass iota
by const reference...
Still anybody can use view_facade
to write their own synchronized_iota
view and send a PR if they so desire. Or even better, write a view::synchronized
adaptor that stores the range behind a mutex and is const iterable. But given that nobody has bothered to even open an issue about this, it doesn't seem to be a problem that occurs often enough in practice to be worth solving.
There are lots of views with internal mutability. Doing what you propose would require adding internal synchronization to all of these views
That is not what I am proposing. Rather the views should be declared as InputRange
.
That is, something as simple as view::iota would need to use atomics (for integers), possibly a mutex for RandomAccessIncrementables in general
That is because you created view::iota
to work as an iterator instead of a range. As such, it should be declared as an InputRange
to make it clear that it is single pass, many of the views already do this, such as view::join
.
I think that demoting views with internal cached state to input
would produce a less useful library. Also, we need to stop using view::iota
as an example: iota_view
is always const-iterable. view::reverse
is a better example: it is multipass but not const-iterable due to the need to cache the underlying range's end iterator to return (adapted) from begin
. No, it cannot cache at construction time since construction - and more importantly copy - would then not be O(1).
view::reverse is a better example: it is multipass but not const-iterable due to the need to cache the underlying range's end iterator to return (adapted) from begin
Since ranges support proxies, you could return a proxy to avoid LWG#198 instead.
I expect more people to run into this since const iteration is really standard, especially since day-1 teaching in C++ tells everyone to pass around const & of their objects and there are reasons to maintain this for generic algorithms that just don't know (or care) if what they are getting owns data or just presents a view onto the data (so long as it's input-iterable for example).
@h-2 I don't think it will be a problem for C++ programmers. Everyone knows (or should know) that a const vector::iterator
does not make much sense and that it is not the same as vector::const_iterator
. The same stands for views. const view::foo
does not make much sense and it is not the same as view::const_ | view:foo
.
While it would be interesting to know which views are const-iterable, I don't think we should expose this API wise. I'd rather leave this as an implementation detail that can change at any time.
@ericniebler Please do! :-)
The discussion in this thread has moved me completely from the "all views should be const iterable and internally synchronize mutating state" camp into the "all views should NOT be const iterable and threads should share copies of views instead of references to ranges" camp.
@CaseyCarter I feel exactly the same.
That is not what I am proposing. Rather the views should be declared as InputRange.
@pfultz2 If all views were InputRanges, the library would loose much of its usability. For instance, now, you can easily shuffle elements on even indices:
vector<int> nums{1, 2, 3, 4, 5};
nums | view::stride(2) | action::shuffle(gen);
If view::stride was an input range, this would not be possible, since shuffle requires random access range (for obvious reasons).
I have to admit, I am a little biased, because most of our use-cases deal with views that are random-access (especially transform and slice) / on random-access containers. For this class of views cheap const iteration is easily available. I suspect that there is huge demand just for this sub-group of views that would warrent giving them extra treatment. But I also understand the value of a consistent library design.
Although he originally said it in jest, I think that @ericniebler's suggestion upthread that maybe NO views should be const iterable may be the best solution to user confusion about which views are const iterable and why.
I still think this is not the best way to go, because it will make life harder for people who mainly work with views that could cheaply support const iteration as mentioned above, and I think it will also reduce the number of (early) adopters of the library, because it forces people to make changes throughout their code-base.
If, however, there is no plan to make const iteration a real feature (as in documented and usable without hacks¹), then I agree, the best solution is probably to get rid of it altogether and just setting up a big warning sign so that people know it's not plug'n'play with existing interfaces and best-practices.
And probably adding a bunch of static_assert
s that redirect people to this warning sign when they try to use interfaces they shouldn't.
¹ I personally think that usability-wise it wouldn't be more complicated than documenting/memorizing which views are random-access, and which aren't, but that's just my opinion [and I know it is more work for the people coding the library!].
@ericniebler
The cache variables are guarded by atomics. The alternative would be to make begin()/end() non-const. I considered the atomic to be the lesser evil, but I am open to other views (har har).
Three thoughts, the atomics:
views
from being used in constexpr contexts,rng | b | c | d | ...
where some of the adaptors are internally synchronized. Many users might be better off by protecting the view behind a mutex instead of relying on an undocumented chain of atomics for synchronization.I really don't think that making begin
/end
const
is worth it. Users need to learn sooner rather than later that making lazy stateful algorithms const
does not allow them to do anything useful with them.
Users need to learn sooner rather than later that making lazy stateful algorithms
const
does not allow them to do anything useful with them.
Two things:
1) I wholeheartedly agree. It looks like the stride
view is the only remaining view that makes any use of atomics. Pull requests welcome.
2) I've been kicking around an idea where views with mutable state are const
-iterable but their category degrades to Input. I am at once horrified and fascinated by the prospect.
but their category degrades to Input
Why? To avoid having amortized O(1) begin / end?
Given the semantics of input iterators, no correct program can detect that the underlying range is mutating in a non-const
-correct way.
So I assume that incorrect programs can detect it, do you have an example of "how incorrect" would a program need to be to detect it? Is it easy or hard? I'd like incorrect programs to either fail to compile or produce a run-time error in those cases.
Containers
Containers in C++ may have both
const
and non-const
overloads ofbegin
. They differ only in their return types: non-const
begin
returns a mutable iterator, andconst
begin
a constant iterator. Neither overload changes the bits of the container representation or modifies the semantic value of the container object. Two overloads exist so that it is necessary to establish a non-const
access path to the container object in order to obtain a mutable iterator.end
is the same.size
is alwaysconst
because it provides no way to modify the container's content.Views
Views in range-v3 may have both
const
and non-const
overloads ofbegin
/end
/size
(herein termed "operations"). Views have pointer semantics - a view is essentially a pointer to a sequence of elements - so mutability of the elements viewed is orthogonal to mutability of the view object itself. Theconst
distinction here has no relation to that of containers. Non-const
operations do not modify the semantic objects being viewed, nor do they "swing the pointer" so that the same view designates different semantic objects. Non-const
operations mutate internal state that does not contribute to the semantic value of the view; theconst
-ness here is purely bitwise.The
const
-ness model used by views makes view composition painful. You can always provide the non-const
overloads, butconst
overloads are preferred when achievable. So a composer, e.g.:ends up providing two definitions of each operation: one
const
that's constrained to requireconst
operations over the underlying view(s):and one mutable that's constrained to only be available when the
const
version isn't:Ranges
I'm concerned that the differing
const
distinctions for containers and views don't mesh well into a consistent notion of whatconst
means for operations on general ranges. I see a potential for latent bugs where a programmer accustomed to the fact that callingbegin
/end
on mutable containers is threadsafe callsbegin
/end
on mutable ranges without realizing there are sharp corners here.The only mutating operation on pointers is assignment. If views are supposed to be range-pointers, perhaps assignment should be the only mutating operation? We (I) need to investigate an alternative model where view operations are always
const
and perform internal synchronization if needed.