SebastienBini / cpp-relocation-proposal

C++ standard proposal for Relocation
0 stars 1 forks source link

Relocation to simplify the type system #40

Open SidneyCogdill opened 1 month ago

SidneyCogdill commented 1 month ago

Currently in C++, if you write a struct which uniquely owns a resource, you have to write a lot of boilerplates in order to be "correct":

struct A {
    A(A &&src) : ptr(src.ptr) {
        src.ptr = nullptr;
    }
    A(const A &) = delete;
    A &operator=(A &&src) {
        delete ptr;
        ptr = src.ptr;
        src.ptr = nullptr;
        return *this;
    }
    A &operator=(const A &) = delete;
    ~A() {
        delete ptr;
    }

private:
    T *ptr;
};

And this is just one ptr, the required amount of duplicated efforts are correlated by the count of uniquely-owned resources in the wrapper type. Requiring the users to manually write all the assignments and then setting the source object back to null state can be tiresome and bug-prone. All the efforts went in, just to satisfy a niche scenario (use after move) which even static analyzers warn you against:

A a{};
auto b = std::move(a);
auto c = a; // Use after move, allowed by C++ standard (flagged by clang-tidy and cppcheck)

With relocation constructors, this is greatly simplified, because now there's less situations (moved-from state) to worry about:

struct A {
    A(A) = default;
    A(A &&) = delete;
    A(const A &) = delete;
    ~A() {
        delete ptr;
    }

private:
    T *ptr;
};
A a{};
auto b = reloc a;
auto c = a; // Error: use after relocate.

Even if the structure isn't trivially relocatable and if you're not working with move-only types, relocation constructor can still be significantly simpler than the old ways. This requires adding two new mechanisms (plus a syntax sugar):

1. Allow the "re-initialization" of relocated-from object:

auto a = 0;
auto b = reloc a; // `a` is now in relocated-from state.
a = 0; // `a` is re-initialized.

This does not conflict with the operator=(T) relocation assignment in the proposal. The re-initialization only occurs if the left operand is in relocated-from state.

This also doesn't break existing codes, because there are no relocated-from objects in any existing codes because reloc doesn't exist today.

2. When an object of a class type with no default constructor, is declared without initialization, then treat it as relocated-from state:

struct A {
    A() = delete;
    A(int);
};

auto f() {
    A a; // `a` is considered to be relocated-from state
}

The idea comes from a "common sense" where:

int a;
a = 0;

is semantically equivalent to:

int a = 0;

except it doesn't today, because class types with custom constructors have very different meaning:

int a;
// Fails to compile, because reference_wrapper
// doesn't have a constructor that takes no argument.
std::reference_wrapper<int> b;
b = a;

This mechanism would partially fix that:

int a;
// Can't be default constructed,
// therefore it's declared but in relocated-from state.
std::reference_wrapper<int> b;
b = a; // `b` is now initialized.

is now the same as std::reference_wrapper<int> b = a.

This doesn't break existing codes, because the above code snippet don't compile today due to unmatched constructor parameters. Those that compile today don't change semantic:

std::string a; // `a` is still default constructed because a default constructor can be found.
a = "test";

3. Destroy-then-re-initialize if no matching operator= is found:

struct A {
    A(A);
    A &operator=(const A &) = delete;
    A &operator=(A) = delete;
}

auto f1() {
    A a{};
    A b{};
    // attempts to match `operator=(A)` and `operator=(A &)` first, 
    // no matching overload found, then fall back to
    // `reloc a` + `A(A)` (destory then re-initialize)
    a = b;
}

It's bascially a syntax sugar for reloc left; left = right;.

This doesn't break existing codes, because the code doesn't compile today due to missing operator=.


With the new mechanisms in place, the following demonstrates how it simplifies libstdc++ style std::string SBO:

struct A {
    A(std::span<T> input);

    A(A src)
    : storage(src.storage)
    , ptr(src.is_small() ? &storage.buf : src.ptr)
    {}

    A(A &&) = delete;

    A(const A &src)
    : storage(src.storage)
    , ptr(src.is_small()
                ? &storage.buf
                : (T *) malloc(sizeof(T) * storage.capacity))
    {}

    ~A() {
        if(!is_small()) {
            free(ptr);
        }
    }

private:
    bool is_small() {
        return ptr == &storage.buf;
    }
    union {
        std::byte buf[16];
        std::size_t capacity;
    } storage;
    T *ptr;
};
auto f1(std::span<char> data) {
    auto a = A{data};
    auto b = a; // A(const A &) called
    auto c = reloc a; // A(A) called

    A d; // uninitialized due to no matching constructors, same as relocated-from state.
    //std::dump(d); // This will be a compile time error due to use after reloc.
    d = reloc b; // initializes `d` with A(A), `d` is alive now.
    d = c; // Due to no matching operator=, equivalent to `reloc d;` followed by `A(A)`.
    return d; // NRVO, or A(A) called
}

vs the traditional way:

struct A {
    A(std::span<T> input);

    A(A &&src)
    : storage(src.storage)
    , ptr(src.is_small() ? &storage.buf : src.ptr)
    {
        src.ptr = &src.storage.buf;
    }

    A &operator=(A &&src) {
        if (&src == this) {
            return *this;
        }
        if(!is_small()) {
            free(ptr);
        }
        storage = src.storage;
        ptr = src.is_small() ? &storage.buf : src.ptr;
        src.ptr = &src.storage.buf;
        return *this;
    }

    A(const A &src)
    : storage(src.storage)
    , ptr(src.is_small()
                ? &storage.buf
                : (T *) malloc(sizeof(T) * storage.capacity))
    {}

    A &operator=(const A &src) {
        if (&src == this) {
            return *this;
        }
        if(!is_small()) {
            free(ptr);
        }
        storage = src.storage;
        ptr = src.is_small()
                    ? &storage.buf
                    : (T *) malloc(sizeof(T) * storage.capacity);
        return *this;
    }

    ~A() {
        if(!is_small()) {
            free(ptr);
        }
    }

private:
    bool is_small() {
        return ptr == &storage.buf;
    }
    union {
        std::byte buf[16];
        std::size_t capacity;
    } storage;
    T *ptr;
};
auto f1(std::span<char> data) {
    auto a = A{data};
    auto b = a; // A(const A &) called
    auto c = std::move(a); // A(A &&) called

    // A d; // It isn't possible to declare a variable without initializing it.
    auto d = A{data};
    d = std::move(b); // A(A &&) called
    d = c; // A(const A &) called
    return d; // NRVO, or A(A &&) called
}

All common use cases that originally required copy constructor + copy assignment + move constructor + move assignment, are now supported by just a relocation constructor + a copy constructor.

And not only it's simpler, it's also safer, because the users now repeat less than before.

SidneyCogdill commented 1 month ago

To give a brief idea of how those mechanisms simplify the null-able/null-restricted property of type system:

Codes Notes C++23 C++26 Relocating prvalues
int a;
a = 42;
f(a);
int is null-restricted type. 0 is a valid state of int, therefore it can't be used for representing "nothing" (which is conceptually the case if the variable is uninitialized because the user haven't gave it a state). Undefined behavior that works in practice Well defined; It's not clear whether it is erroneous behavior Well defined
int a;
f(a);
Same as above Undefined behavior Well defined as erroneous behavior Ill-formed, diagnostic required: Use of relocated-from object
std::optional<int> a;
f(a);
std::optional<T> is null-able type. Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state
int a = 10;
std::reference_wrapper<int> b;
b = a;
f(b);
std::reference_wrapper<T> is null-restricted type. It can't be initialized without arguments. Doesn't compile today Doesn't compile today Well defined
std::reference_wrapper<int> a;
f(a);
Same as above Doesn't compile today Doesn't compile today Ill-formed, diagnostic required: Use of relocated-from object
std::vector<int> a;
a = {1, 2, 3};
f(a);
std::vector<T> is null-able type. The null state can be represented by setting its size and capacity to 0. Well defined Well defined Well defined
std::vector<int> a;
f(a);
Same as above Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state
int *p;
f(p);
T * is semantically equivalent to std::optional<std::reference_wrapper<T>>, which is null-able type. The null state can be represented by nullptr. Undefined behavior Well defined as erroneous behavior Well defined; p is guaranteed to be in null state (nullptr)
int a = 10;
int *p;
p = &a;
f(p);
Same as above Undefined behavior that works in practice Well defined; It's not clear whether it is erroneous behavior Well defined

With relocation semantic in mind, this has the chance to:

  1. bridge the syntax gap between fundamental types (int, float, ...) and user-defined types (std::vector, QString, ...) to work in a consistent manner.
  2. bring null-safety into C++ type system as other popular programming languages do, but in a way that doesn't require heavy annotation ( eg... Java T? / T! ) or break existing well formed codes ( eg... C# ), and doesn't introduce runtime costs.

Note that destruction followed by an immediate re-initialization within the same lexical scope works with caller-destroy ABI no matter if the operand is owned or unowned, because you put something back to the storage before leaving the function scope.

Therefore, although I suggest that the conservative approach of silently turning reloc into noop should be reconsidered, the new mechanisms introduced above won't be blocked by ABI and already works with current (R4) revision.

SidneyCogdill commented 1 month ago

Re-initialization should not be allowed on const objects, otherwise it breaks the common sense of "you can't assign new values to constant".

Codes C++23 C++26 Relocating prvalues
int const a;
f(a);
Doesn't compile today; const variables must be initialized Doesn't compile today; const variables must be initialized Ill-formed, diagnostic required: Use of relocated-from object
int const a;
a = 10;
f(a);
Same as above Same as above Ill-formed, diagnostic required: const variables can't be re-initialized
int *const a;
f(a);
Same as above Same as above Well defined; a is guaranteed to be in null state (nullptr)
std::vector<int> const a;
f(a);
Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state Well defined; a is guaranteed to be in null state
std::reference_wrapper<int> const a;
f(a);
Doesn't compile today Doesn't compile today Ill-formed, diagnostic required: Use of relocated-from object
SebastienBini commented 1 month ago

Hello and thank you for your feedback. Let's tackle this point by point.

Introduction

This is very-well formulated. Note that, to strengthen your point , A::ptr can even be const in the relocation constructor design.

1. Allow the "re-initialization" of relocated-from object

I don't see why this is needed. Besides, I doubt it's easily feasible: a(0); to reinit a will be parsed as a function call, not a constructor call.

One thing that can be considered, is to allow the redeclaration of a relocated variable:

auto a = 0;
auto b = reloc a; // `a` is now in relocated-from state.
auto a = 0; // `a` is re-declared.

ideally to allow complex object initialization:

auto a = foo();
a.func();
a.func2();
auto const a = reloc a;

as a less verbose alternative to the IIFE idiom.

However I don't think that's needed for a first proposal. It's something that can still be added at a later stage in my opinion.

2. When an object of a class type with no default constructor, is declared without initialization, then treat it as relocated-from state

I feel mixed about this one. It's recommended to have variables initialized when declared, and their declaration to be as close as possible as to when the variable is actually needed (i.e. no C-style functions where all the variables are declared uninitialized at the top of the function). I fear if we allow to declare a variable in relocated state, then people may abuse it, and will make code harder to follow.

Also, sometimes variables are purposefully declared not initialized. In networking for instance, it's common to declare a packed structure describing a protocol header, and to write the received data directly into such a structure, which was uninitialized before. 0-init this structure would be a waste.

Finally, it will break existing code as you pointed out yourself: int a; f(a); would emit an ill-formed diagnostic. I admit your snippet looks smelly, but what about: char buf[256]; memcpy(buf, somebuf, 256);? This looks legit and 0-init buf will slow down some applications for no reason.

And what about variables with dynamic storage? How would you manage: auto* a = new T; if T has no matching default constructor?

3. Destroy-then-re-initialize if no matching operator= is found

As much as I like the spirit, I don't think that's feasible either. This reminds me of a discussion I had with Ed, about what the default implementation of T::operator=(T) should do. I advocated for destroy + reconstruct, and Ed convinced me otherwise. There are two issues with destroy + relocate:

SidneyCogdill commented 1 month ago

to reinit a will be parsed as a function call, not a constructor call.

a(0); syntax isn't being suggested. The only allowed form of re-initialization would be the b = a style (only matches implicit constructors, same as auto b = a do). It's there to bridge the syntax gap between Bar b; b = a; and Bar b = a;: With re-initializatiion they're now equivalent.


it will break existing code as you pointed out yourself

int a;
f(a);

I'm aware of that when I wrote it down. This isn't well formed code today (and it's very likely wrong code). Even if you take (well-defined) erroneous behavior in C++26 into consideration, rejecting an erroneous behavior at compile time helps catching errors eariler and reduces runtime costs, because erroneous behaviors are codes that're just plain wrong and shouldn't exist. Besides...

char buf[256];
memcpy(buf, somebuf, 256);

Both this and the previous examples are covered today in C++26: char buf[256] [[indeterminate]] allowing uninitialized read.


what about variables with dynamic storage

new T;

The changes that I currently suggested are entirely limited to automatic storage variables, so you can for example have:

T *ptr; // guaranteed to be null (nullptr)
...
// At a later stage
ptr = new T(...);

But not auto ptr = new T;.

since new attempts to match operator new overloads (and therefore is another whole of complicated situations to deal with), the best way to handle this today is to just not bother with it right now. It's rejected today due to unmatched constructor arguments; it can be improved later if desired.


I fear if we allow to declare a variable in relocated state, then people may abuse it, and will make code harder to follow.

I personally don't think that's a huge concern because every new features are likely abused (because people get a new toy), and IMO unified syntax of null-able / null-restricted types outweighs the fears of it being abused.

Due to C++'s inability to handle null-restricted types like gsl::not_null<std::unique_ptr<T>> nicely, currently people use null-able types everywhere even when they don't make sense logically. Unifying the syntax would be helpful at promoting the use of null-restricted types, you can now teach users to "just slap not_null on pointers and let compiler catches null dereference" (of course that comes with its limitations, but being safer than before is an improvement).


all the other defaulted special member functions perform member-wise operations. T::operator=(T) should be no different.

Destroy-then-re-initialize only performs when no matching operator= is found. If you have a defaulted T::operator=(T), then it is called instead. The destroy + reconstruct only performs if:

The point is to give users the chance to simplify the code path: Now you can have less code paths to think about while maintaining roughly the same functionality. A class with those interfaces:

class A {
public:
    A(A);
    A(const A &);
    auto operator=(auto) = delete;
    ~A();
};

Can support just about the same use cases as this:

class A {
public:
    A(A);
    A(const A &);
    A operator=(A);
    A operator=(const A &);
    ~A();
};

Remind that if you need a custom relocation constructor, then it probably means you have address-sensitive members (self-referental and stuffs) that needs to be updated on relocation, in which case the A operator=(A) must be handled manually as well. Defaulted member-wise relocation assignment gives you disaster when you write something like this:

struct A {
    A(int input)
    : num(input)
    , ptr_to_num(&num)
    {}

    A(A src)
    : num(src.num)
    , ptr_to_num(&num)
    {}

    A operator=(A) = default; // For member-wise relocation it's likely disaster

private:
    int num;
    int *ptr_to_num;
};

It's similar to rule-of-five in current C++: If you write either a custom relocation constructor, custom copy constructor, relocation assigment or copy assignment, custom destructor, you probably want to write all of them.

The issue with rule-of-five is that you are almost always going to duplicate the codes for no good reason (except for specific cases, for example if you're supporting b = a syntax for proxy types or atomic store ops).

Of course in this specific example it's simple to mitigate the disaster: If any member variable is a pointer or reference, then the default relocation constructor/assignment should be automatically marked as delete (requiring users to manually write them). However, this doesn't fix the issue that in the end, the user needs to duplicate codes, because it's still required to manually write them in the first place. And duplicating codes are known to be error-prone.


destroy + relocate has poor exception safety

Exception safety is something I haven't deeply thought through, and that's a good point. I assumed that you shouldn't throw in destructors and relocation constructors are naturally nothrow due to no allocation requirement. However if they do throw, then this causes issues I don't have a clear solution to currently.

The issue lies around the fact that whether a object is relocated at a specific point is statically inferred at compile time, while whether an (potentially throwing) operation actually throws or not is only known at runtime, they don't mix that well.


Also, note that the suggestions are not "you must add those mechanisms before Relocating prvalues is useful", but instead "what you can build on top of Relocating prvalues" (in case you need more motivation examples in the proposal, because safety discussions are popular recently). Relocating prvalues proposal at its current stage is already immediately useful, because you can have usable null-restricted types with relocation constructors.

SidneyCogdill commented 3 days ago

https://wg21.link/p3019r11

Both indirect and polymorphic have a valueless state that is used to implement move. The valueless state is not intended to be observable to the user. There is no operator bool or has_value member function. Accessing the value of an indirect or polymorphic after it has been moved from is undefined behaviour. We provide a valueless_after_move member function that returns true if an object is in a valueless state. This allows explicit checks for the valueless state in cases where it cannot be verified statically

It's coming to C++26 🙃🙃🙃: https://wg21.link/p3019/github, so much for the "we just made C++26 safer" nonsense.

Society if C++ has relocation constructor: \<inserts modern city picture>.

Anyway, the poor state of null safety (or the lack of it) demonstrated by the new footgun introduced by newer C++ standard library further strengthens the point in adding relocation constructor to C++.

Description Code Result Note
P3019 without Relocating prvalues Use of optional<polymorphic<T, Alloc>> after move optional<polymorphic<int, allocator<int>>> p1{};
auto p2 = std::move(p1);
use(p1);
Well defined optional<T> is null-able type.
Use of polymorphic<T, Alloc> after move polymorphic<int, allocator<int>> p1{};
auto p2 = std::move(p1);
use(p1);
Undefined behavior polymorphic<T, Alloc> is null-restricted type as per P3019 (R11).
P3019 + Relocating prvalues Use of optional<polymorphic<T, Alloc>> after move optional<polymorphic<int, allocator<int>>> p1{};
auto p2 = std::move(p1);
use(p1);
Well defined optional<T> is null-able type.
Use of polymorphic<T, Alloc> after move polymorphic<int, allocator<int>> p1{};
auto p2 = std::move(p1);
use(p1);
Ill-formed, diagnostic required: No matching overload for polymorphic<int, allocator<int>> && With the existence of relocation semantic, there's no point in adding move constructor to null-restricted types.
Use of polymorphic<T, Alloc> after relocate std::polymorphic<int, std::allocator<int>> p1{};
auto p2 = reloc p1;
use(p1);
Ill-formed, diagnostic required: Use of relocated-from object polymorphic<T, Alloc> is null-restricted type as per P3019 (R11).

Note: Removing move constructor from std::polymorphic is a breaking change.

But adding move constructor to a null-restricted type without adding language facilities (relocation) to support them is a Huge mistake in the first place. It's almost guaranteed to replicate the failure of std::auto_ptr where it implements ownership transfer semantic without r-value references.

Relocating prvalues proposal can only gain consensus in the committee by stepping over the failure of std::polymorphic. And then there will be new types with relocation semantic in mind that supersedes std::polymorphic, same as what std::unique_ptr is to std::auto_ptr.

SebastienBini commented 3 days ago

I have given more thoughts on allowing re-initialization from relocated values, and the synthesized assignment operator.

I see the value in reassigning to a relocated object, but I'm reluctant against using the a = *init-expression* syntax. The language states (with relocating prvalues) that a relocated object or a potentially relocated object cannot be reused. Hence the meaning of the above statement changes depending whether a is relocated or not... and if a was relocated in another control-flow, then you may only know at run-time if a was relocated. Then how the expression should be evaluated (regular assignment operator or re initialization) may only be known at run-time...

If we are taking this road, then I suggest we use a new initialization operator: := that is used in other languages. Writing a := *init-expr* then has a clear meaning. If it is determined at run-time that a is not relocated, then its destructor is invoked before the initialization. We could even allow it in cases where a is not even potentially relocated. Then this operator would be equivalent to: a.~A(); new (&a) A{*init-expr*};, with the destructor invoked only if a is not relocated.

Since this new operator is not a function, we care less about its exception safety, as long as it is applied on local, not-ref-qualified variables. Indeed, let's see what happens when an exception leaks through a := *init-expr*;, a being a local, not ref-qualified variable:

Things are more complicated if a is ref-qualified: void foo(A& a) { a := bar(); } . If the new operator throws, then the function will exit with a being a dangling reference... For the moment my two solutions for this are:

I prefer the second solution, although I would have liked above all a solution that worked better with exceptions...

That being said, with the new := operator, and given that we can make it work on ref-qualified objects, we obtain the desired class design:

class A {
public:
    A(A) noexcept;
    A(const A &);
    ~A() noexcept;
};

The resource management logic needs to be written only once for each constructor/destructor, and not repeated in the matching assignment operator.

However there are still open questions:

I personally don't think that's a huge concern because every new features are likely abused (because people get a new toy), and IMO unified syntax of null-able / null-restricted types outweighs the fears of it being abused.

Yes, but I still don't see why we should allow it in the first place (talking about declared objects with no matching constructor to be considered as relocated).


I hadn't followed this subject, but it is sad indeed... It makes me want to rewrite part of the motivation section of the paper to put the emphasis on safety.

By the way, I'm looking for people that could help me with this paper. I am mostly on my own for the moment (Ed Catmur, the other co-author, tragically passed away last year). There are still bits to rewrite before R4 becomes official and ready to be defended. If you want to help, don't hesitate to contact me (sebastien.bini at gmail.com).