Open SidneyCogdill opened 1 month ago
To give a brief idea of how those mechanisms simplify the null-able/null-restricted property of type system:
Codes | Notes | C++23 | C++26 | Relocating prvalues |
---|---|---|---|---|
int a; a = 42; f(a); |
int is null-restricted type. 0 is a valid state of int , therefore it can't be used for representing "nothing" (which is conceptually the case if the variable is uninitialized because the user haven't gave it a state). |
Undefined behavior that works in practice | Well defined; It's not clear whether it is erroneous behavior | Well defined |
int a; f(a); |
Same as above | Undefined behavior | Well defined as erroneous behavior | Ill-formed, diagnostic required: Use of relocated-from object |
std::optional<int> a; f(a); |
std::optional<T> is null-able type. |
Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
int a = 10; std::reference_wrapper<int> b; b = a; f(b); |
std::reference_wrapper<T> is null-restricted type. It can't be initialized without arguments. |
Doesn't compile today | Doesn't compile today | Well defined |
std::reference_wrapper<int> a; f(a); |
Same as above | Doesn't compile today | Doesn't compile today | Ill-formed, diagnostic required: Use of relocated-from object |
std::vector<int> a; a = {1, 2, 3}; f(a); |
std::vector<T> is null-able type. The null state can be represented by setting its size and capacity to 0. |
Well defined | Well defined | Well defined |
std::vector<int> a; f(a); |
Same as above | Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
int *p; f(p); |
T * is semantically equivalent to std::optional<std::reference_wrapper<T>> , which is null-able type. The null state can be represented by nullptr . |
Undefined behavior | Well defined as erroneous behavior | Well defined; p is guaranteed to be in null state (nullptr ) |
int a = 10; int *p; p = &a; f(p); |
Same as above | Undefined behavior that works in practice | Well defined; It's not clear whether it is erroneous behavior | Well defined |
With relocation semantic in mind, this has the chance to:
int
, float
, ...) and user-defined types (std::vector
, QString
, ...) to work in a consistent manner.T?
/ T!
) or break existing well formed codes ( eg... C# ), and doesn't introduce runtime costs.Note that destruction followed by an immediate re-initialization within the same lexical scope works with caller-destroy ABI no matter if the operand is owned or unowned, because you put something back to the storage before leaving the function scope.
Therefore, although I suggest that the conservative approach of silently turning reloc
into noop should be reconsidered, the new mechanisms introduced above won't be blocked by ABI and already works with current (R4) revision.
Re-initialization should not be allowed on const
objects, otherwise it breaks the common sense of "you can't assign new values to constant".
Codes | C++23 | C++26 | Relocating prvalues |
---|---|---|---|
int const a; f(a); |
Doesn't compile today; const variables must be initialized |
Doesn't compile today; const variables must be initialized |
Ill-formed, diagnostic required: Use of relocated-from object |
int const a; a = 10; f(a); |
Same as above | Same as above | Ill-formed, diagnostic required: const variables can't be re-initialized |
int *const a; f(a); |
Same as above | Same as above | Well defined; a is guaranteed to be in null state (nullptr ) |
std::vector<int> const a; f(a); |
Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
Well defined; a is guaranteed to be in null state |
std::reference_wrapper<int> const a; f(a); |
Doesn't compile today | Doesn't compile today | Ill-formed, diagnostic required: Use of relocated-from object |
Hello and thank you for your feedback. Let's tackle this point by point.
Introduction
This is very-well formulated. Note that, to strengthen your point , A::ptr
can even be const
in the relocation constructor design.
1. Allow the "re-initialization" of relocated-from object
I don't see why this is needed. Besides, I doubt it's easily feasible: a(0);
to reinit a
will be parsed as a function call, not a constructor call.
One thing that can be considered, is to allow the redeclaration of a relocated variable:
auto a = 0;
auto b = reloc a; // `a` is now in relocated-from state.
auto a = 0; // `a` is re-declared.
ideally to allow complex object initialization:
auto a = foo();
a.func();
a.func2();
auto const a = reloc a;
as a less verbose alternative to the IIFE idiom.
However I don't think that's needed for a first proposal. It's something that can still be added at a later stage in my opinion.
2. When an object of a class type with no default constructor, is declared without initialization, then treat it as relocated-from state
I feel mixed about this one. It's recommended to have variables initialized when declared, and their declaration to be as close as possible as to when the variable is actually needed (i.e. no C-style functions where all the variables are declared uninitialized at the top of the function). I fear if we allow to declare a variable in relocated state, then people may abuse it, and will make code harder to follow.
Also, sometimes variables are purposefully declared not initialized. In networking for instance, it's common to declare a packed structure describing a protocol header, and to write the received data directly into such a structure, which was uninitialized before. 0-init this structure would be a waste.
Finally, it will break existing code as you pointed out yourself: int a; f(a);
would emit an ill-formed diagnostic. I admit your snippet looks smelly, but what about: char buf[256]; memcpy(buf, somebuf, 256);
? This looks legit and 0-init buf
will slow down some applications for no reason.
And what about variables with dynamic storage? How would you manage: auto* a = new T;
if T
has no matching default constructor?
3. Destroy-then-re-initialize if no matching operator= is found
As much as I like the spirit, I don't think that's feasible either. This reminds me of a discussion I had with Ed, about what the default implementation of T::operator=(T)
should do. I advocated for destroy + reconstruct, and Ed convinced me otherwise. There are two issues with destroy + relocate:
T::operator=(T)
should be no different. (More consistent, easier to teach, and as efficient).reloc left;
), or if the relocation constructor throws, you end up with the source object in a destructed state during the exception propagation:
A a = foo();
try {
a = bar();
}
catch (...) {
a = foo(); // a may be destroyed here, while it should be considered alive...
}
and without the try-catch block, stack unwinding will invoke a.~A()
on a destructed object. Users can still provide their own definition in terms of destroy + relocate. The proposal in section 5.5.3.2 discusses this.
to reinit
a
will be parsed as a function call, not a constructor call.
a(0);
syntax isn't being suggested. The only allowed form of re-initialization would be the b = a
style (only matches implicit constructors, same as auto b = a
do). It's there to bridge the syntax gap between Bar b; b = a;
and Bar b = a;
: With re-initializatiion they're now equivalent.
it will break existing code as you pointed out yourself
int a;
f(a);
I'm aware of that when I wrote it down. This isn't well formed code today (and it's very likely wrong code). Even if you take (well-defined) erroneous behavior in C++26 into consideration, rejecting an erroneous behavior at compile time helps catching errors eariler and reduces runtime costs, because erroneous behaviors are codes that're just plain wrong and shouldn't exist. Besides...
char buf[256];
memcpy(buf, somebuf, 256);
Both this and the previous examples are covered today in C++26: char buf[256] [[indeterminate]]
allowing uninitialized read.
what about variables with dynamic storage
new T;
The changes that I currently suggested are entirely limited to automatic storage variables, so you can for example have:
T *ptr; // guaranteed to be null (nullptr)
...
// At a later stage
ptr = new T(...);
But not auto ptr = new T;
.
since new
attempts to match operator new
overloads (and therefore is another whole of complicated situations to deal with), the best way to handle this today is to just not bother with it right now. It's rejected today due to unmatched constructor arguments; it can be improved later if desired.
I fear if we allow to declare a variable in relocated state, then people may abuse it, and will make code harder to follow.
I personally don't think that's a huge concern because every new features are likely abused (because people get a new toy), and IMO unified syntax of null-able / null-restricted types outweighs the fears of it being abused.
Due to C++'s inability to handle null-restricted types like gsl::not_null<std::unique_ptr<T>>
nicely, currently people use null-able types everywhere even when they don't make sense logically. Unifying the syntax would be helpful at promoting the use of null-restricted types, you can now teach users to "just slap not_null
on pointers and let compiler catches null dereference" (of course that comes with its limitations, but being safer than before is an improvement).
all the other defaulted special member functions perform member-wise operations.
T::operator=(T)
should be no different.
Destroy-then-re-initialize only performs when no matching operator=
is found. If you have a defaulted T::operator=(T)
, then it is called instead. The destroy + reconstruct only performs if:
T::operator=(T)
can't be defaulted and is automatically marked as deletedThe point is to give users the chance to simplify the code path: Now you can have less code paths to think about while maintaining roughly the same functionality. A class with those interfaces:
class A {
public:
A(A);
A(const A &);
auto operator=(auto) = delete;
~A();
};
Can support just about the same use cases as this:
class A {
public:
A(A);
A(const A &);
A operator=(A);
A operator=(const A &);
~A();
};
Remind that if you need a custom relocation constructor, then it probably means you have address-sensitive members (self-referental and stuffs) that needs to be updated on relocation, in which case the A operator=(A)
must be handled manually as well. Defaulted member-wise relocation assignment gives you disaster when you write something like this:
struct A {
A(int input)
: num(input)
, ptr_to_num(&num)
{}
A(A src)
: num(src.num)
, ptr_to_num(&num)
{}
A operator=(A) = default; // For member-wise relocation it's likely disaster
private:
int num;
int *ptr_to_num;
};
It's similar to rule-of-five in current C++: If you write either a custom relocation constructor, custom copy constructor, relocation assigment or copy assignment, custom destructor, you probably want to write all of them.
The issue with rule-of-five is that you are almost always going to duplicate the codes for no good reason (except for specific cases, for example if you're supporting b = a
syntax for proxy types or atomic store ops).
Of course in this specific example it's simple to mitigate the disaster: If any member variable is a pointer or reference, then the default relocation constructor/assignment should be automatically marked as delete (requiring users to manually write them). However, this doesn't fix the issue that in the end, the user needs to duplicate codes, because it's still required to manually write them in the first place. And duplicating codes are known to be error-prone.
destroy + relocate has poor exception safety
Exception safety is something I haven't deeply thought through, and that's a good point. I assumed that you shouldn't throw in destructors and relocation constructors are naturally nothrow due to no allocation requirement. However if they do throw, then this causes issues I don't have a clear solution to currently.
The issue lies around the fact that whether a object is relocated at a specific point is statically inferred at compile time, while whether an (potentially throwing) operation actually throws or not is only known at runtime, they don't mix that well.
Also, note that the suggestions are not "you must add those mechanisms before Relocating prvalues is useful", but instead "what you can build on top of Relocating prvalues" (in case you need more motivation examples in the proposal, because safety discussions are popular recently). Relocating prvalues proposal at its current stage is already immediately useful, because you can have usable null-restricted types with relocation constructors.
Both indirect and polymorphic have a valueless state that is used to implement move. The valueless state is not intended to be observable to the user. There is no operator bool or has_value member function. Accessing the value of an indirect or polymorphic after it has been moved from is undefined behaviour. We provide a valueless_after_move member function that returns true if an object is in a valueless state. This allows explicit checks for the valueless state in cases where it cannot be verified statically
It's coming to C++26 🙃🙃🙃: https://wg21.link/p3019/github, so much for the "we just made C++26 safer" nonsense.
Society if C++ has relocation constructor: \<inserts modern city picture>.
Anyway, the poor state of null safety (or the lack of it) demonstrated by the new footgun introduced by newer C++ standard library further strengthens the point in adding relocation constructor to C++.
Description | Code | Result | Note | |
---|---|---|---|---|
P3019 without Relocating prvalues | Use of optional<polymorphic<T, Alloc>> after move |
optional<polymorphic<int, allocator<int>>> p1{}; auto p2 = std::move(p1); use(p1); |
Well defined | optional<T> is null-able type. |
Use of polymorphic<T, Alloc> after move |
polymorphic<int, allocator<int>> p1{}; auto p2 = std::move(p1); use(p1); |
Undefined behavior | polymorphic<T, Alloc> is null-restricted type as per P3019 (R11). |
|
P3019 + Relocating prvalues | Use of optional<polymorphic<T, Alloc>> after move |
optional<polymorphic<int, allocator<int>>> p1{}; auto p2 = std::move(p1); use(p1); |
Well defined | optional<T> is null-able type. |
Use of polymorphic<T, Alloc> after move |
polymorphic<int, allocator<int>> p1{}; auto p2 = std::move(p1); use(p1); |
Ill-formed, diagnostic required: No matching overload for polymorphic<int, allocator<int>> && |
With the existence of relocation semantic, there's no point in adding move constructor to null-restricted types. | |
Use of polymorphic<T, Alloc> after relocate |
std::polymorphic<int, std::allocator<int>> p1{}; auto p2 = reloc p1; use(p1); |
Ill-formed, diagnostic required: Use of relocated-from object | polymorphic<T, Alloc> is null-restricted type as per P3019 (R11). |
Note: Removing move constructor from std::polymorphic
is a breaking change.
But adding move constructor to a null-restricted type without adding language facilities (relocation) to support them is a Huge mistake in the first place. It's almost guaranteed to replicate the failure of std::auto_ptr
where it implements ownership transfer semantic without r-value references.
Relocating prvalues proposal can only gain consensus in the committee by stepping over the failure of std::polymorphic
. And then there will be new types with relocation semantic in mind that supersedes std::polymorphic
, same as what std::unique_ptr
is to std::auto_ptr
.
I have given more thoughts on allowing re-initialization from relocated values, and the synthesized assignment operator.
I see the value in reassigning to a relocated object, but I'm reluctant against using the a = *init-expression*
syntax. The language states (with relocating prvalues) that a relocated object or a potentially relocated object cannot be reused. Hence the meaning of the above statement changes depending whether a
is relocated or not... and if a
was relocated in another control-flow, then you may only know at run-time if a
was relocated. Then how the expression should be evaluated (regular assignment operator or re initialization) may only be known at run-time...
If we are taking this road, then I suggest we use a new initialization operator: :=
that is used in other languages. Writing a := *init-expr*
then has a clear meaning. If it is determined at run-time that a
is not relocated, then its destructor is invoked before the initialization. We could even allow it in cases where a
is not even potentially relocated. Then this operator would be equivalent to: a.~A(); new (&a) A{*init-expr*};
, with the destructor invoked only if a
is not relocated.
Since this new operator is not a function, we care less about its exception safety, as long as it is applied on local, not-ref-qualified variables. Indeed, let's see what happens when an exception leaks through a := *init-expr*;
, a
being a local, not ref-qualified variable:
a
destructor will not be invoked by stack unwinding.a
as potentially relocated.Things are more complicated if a
is ref-qualified: void foo(A& a) { a := bar(); }
. If the new operator throws, then the function will exit with a
being a dangling reference... For the moment my two solutions for this are:
I prefer the second solution, although I would have liked above all a solution that worked better with exceptions...
That being said, with the new := operator, and given that we can make it work on ref-qualified objects, we obtain the desired class design:
class A {
public:
A(A) noexcept;
A(const A &);
~A() noexcept;
};
The resource management logic needs to be written only once for each constructor/destructor, and not repeated in the matching assignment operator.
However there are still open questions:
a := foo(reloc a);
? Should they be legal? If so, then we need to evaluate the right expression to know whether a
is left in a destructed state, but evaluating the expression without calling a
's destructor beforehand would prevent the relocation elision. (i.e. If a
needs to be destroyed, then you cannot construct whatever the right expression returns at a
's address...).obj.get_string_const_ref() := "new string";
.I personally don't think that's a huge concern because every new features are likely abused (because people get a new toy), and IMO unified syntax of null-able / null-restricted types outweighs the fears of it being abused.
Yes, but I still don't see why we should allow it in the first place (talking about declared objects with no matching constructor to be considered as relocated).
I hadn't followed this subject, but it is sad indeed... It makes me want to rewrite part of the motivation section of the paper to put the emphasis on safety.
By the way, I'm looking for people that could help me with this paper. I am mostly on my own for the moment (Ed Catmur, the other co-author, tragically passed away last year). There are still bits to rewrite before R4 becomes official and ready to be defended. If you want to help, don't hesitate to contact me (sebastien.bini at gmail.com).
Currently in C++, if you write a struct which uniquely owns a resource, you have to write a lot of boilerplates in order to be "correct":
And this is just one
ptr
, the required amount of duplicated efforts are correlated by the count of uniquely-owned resources in the wrapper type. Requiring the users to manually write all the assignments and then setting the source object back to null state can be tiresome and bug-prone. All the efforts went in, just to satisfy a niche scenario (use after move) which even static analyzers warn you against:With relocation constructors, this is greatly simplified, because now there's less situations (moved-from state) to worry about:
Even if the structure isn't trivially relocatable and if you're not working with move-only types, relocation constructor can still be significantly simpler than the old ways. This requires adding two new mechanisms (plus a syntax sugar):
1. Allow the "re-initialization" of relocated-from object:
This does not conflict with the
operator=(T)
relocation assignment in the proposal. The re-initialization only occurs if the left operand is in relocated-from state.This also doesn't break existing codes, because there are no relocated-from objects in any existing codes because
reloc
doesn't exist today.2. When an object of a class type with no default constructor, is declared without initialization, then treat it as relocated-from state:
The idea comes from a "common sense" where:
is semantically equivalent to:
except it doesn't today, because class types with custom constructors have very different meaning:
This mechanism would partially fix that:
is now the same as
std::reference_wrapper<int> b = a
.This doesn't break existing codes, because the above code snippet don't compile today due to unmatched constructor parameters. Those that compile today don't change semantic:
3. Destroy-then-re-initialize if no matching
operator=
is found:It's bascially a syntax sugar for
reloc left; left = right;
.This doesn't break existing codes, because the code doesn't compile today due to missing
operator=
.With the new mechanisms in place, the following demonstrates how it simplifies libstdc++ style std::string SBO:
vs the traditional way:
All common use cases that originally required copy constructor + copy assignment + move constructor + move assignment, are now supported by just a relocation constructor + a copy constructor.
And not only it's simpler, it's also safer, because the users now repeat less than before.