Open fwyzard opened 3 years ago
Thinking about it:
an explicitly invalid state would also be OK for our use case:
// instantiate a buffer in an empty and invalid state
auto buffer = HostBuffer<float>::invalid_buffer();
could be a reasonable alternative to
// instantiate a buffer in an empty and invalid state
HostBuffer<float> buffer;
a way to check if a buffer is valid or not might be useful::
// check if the buffer is in invalid state
if (buffer.is_valid()) { ... }
or
// check if the buffer is in invalid state
if (buffer.valid()) { ... }
of just making it contextually convertible to bool
:
// check if the buffer is in invalid state
if (buffer) { ... }
a method to invalidate an existing would also be useful:
// reset the buffer to an empty and invalid state
buffer.reset();
but it could be done by hand setting it to the invalid value:
// reset the buffer to an empty and invalid state
buffer = HostBuffer<float>::invalid_buffer();
In fact, Alpaka objects do have an "empty, invalid state" - the "moved from" state:
#include <iostream>
#include <alpaka/alpaka.hpp>
int main(void) {
using Idx = uint32_t;
using Host = alpaka::DevCpu;
using HostPlatform = alpaka::Pltf<Host>;
const auto host = alpaka::getDevByIdx<HostPlatform>(0u);
// construct a buffer and allocate the corresponding memory
auto b1 = alpaka::allocBuf<char, Idx>(host, 42u);
std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;
// move the memory to a different buffer, leaving the original buffer in an invalid state
auto b2 = std::move(b1);
std::cerr << "b1 is invalid: " << std::endl;
std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;
return 0;
}
So, a workaround could simply be keep around one invalid object per type...
// construct a buffer in an invalid state
auto invalid_buffer = b1;
std::cerr << "invalid_buffer is invalid: " << std::endl;
std::cerr << (void*) alpaka::getPtrNative(invalid_buffer) << std::endl;
Clearly, not the most elegant solution :-D
Not having an "invalid" state is part of the RAII (resource acquisition is initialization) pattern prevalent in alpaka. The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).
The advantage of not having an invalid state is, that one does not need to check for it before using an object at runtime. Anything that would lead use of an object failing becomes a compile-time error instead.
You should consider, that there is a tradeoff: Either you use std::optional
where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.
The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).
Er, no.
Moved-from objects should be in a "valid but unspecified state", not in an invalid state.
For example, an std::vector
could be empty, or an std::unique_ptr
could point to nullptr
- but they should still be usable, and in fact they are:
#include <iostream>
#include <memory>
#include <vector>
int main(void) {
std::vector<int> v = { 42, 0, 9, -4, 2, 27, -99 };
std::cout << "size before move: " << v.size() << std::endl;
auto w = std::move(v);
std::cout << "size after move: " << v.size() << std::endl;
std::unique_ptr<int> p = std::make_unique<int>(42);
std::cout << "address before move: " << p.get() << std::endl;
auto q = std::move(p);
std::cout << "address after move: " << p.get() << std::endl;
return 0;
}
Accessing a non-existent element of the vector
or dereferencing the nullptr
-values unique_ptr
would lead to a crash at runtime (or an exception if using v.at()
), but that would be a bug on the user side, which could be prevented by checking the status of the object.
Anything that would lead use of an object failing becomes a compile-time error instead.
As my examples clearly demonstrate, this is not true.
There is no compile time error for "use after move" - neither in the case of STL objects (where using them after moving-from is fine, as long as the status of the objects is checked) or of the Alpaka objects (where trying to read the pointer of a moved-from buffer leads to a segmentation fault, instead of returning nullptr
).
Either you use
std::optional
where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.
There is a third possibility which you are not considering: the code that uses an object may be fully aware of when an object is safe to use or not due to the value of other variables, without the need of the overhead from using std::optional
.
The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).
Er, no.
Moved-from objects should be in a "valid but unspecified state", not in an invalid state.
That is a guarantee that (at least most) STL types give. I am not aware, that this is a requirement of the core language for move semantics. The only thing an object must guarantee after being moved from is that its destructor does not invalidate anything that was moved away. It may, for example, be no longer assignable, although I agree, that the guidelines the STL is following here are better.
Accessing a non-existent element of the
vector
or dereferencing thenullptr
-valuesunique_ptr
would lead to a crash at runtime (or an exception if usingv.at()
), but that would be a bug on the user side, which could be prevented by checking the status of the object.Anything that would lead use of an object failing becomes a compile-time error instead.
As my examples clearly demonstrate, this is not true.
There is indeed this case, where you move away and then access it, which would lead to a runtime-error. What I was referring to is, that moving away and then accessing is often considered an anti-pattern and would at least raise some careful looks during code review. Any other use of an alpaka object is valid due to the non-existent invalid state. In fact, since alpaka types do not have an invalid state and one cannot really determine whether it is valid, there is not save way to use an alpaka object that may have been moved from.
There is no compile time error for "use after move" - neither in the case of STL objects (where using them after moving-from is fine, as long as the status of the objects is checked) or of the Alpaka objects (where trying to read the pointer of a moved-from buffer leads to a segmentation fault, instead of returning
nullptr
).
It is correct, that there is not compile time error for use-after-move, but currently, this is the only situation in which you can end up with an invalid alpaka object, limiting the opportunities for runtime errors.
Either you use
std::optional
where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.There is a third possibility which you are not considering: the code that uses an object may be fully aware of when an object is safe to use or not due to the value of other variables, without the need of the overhead from using
std::optional
.
std::optional
makes the code more explicit and self-documenting: In places where you use std::optional<Buffer>
it is clear, that the buffer may be invalid, where Buffer
is used, it cannot be invalid.std::shared_ptr
to the actual shared object). An invalid state could be one of two things:
shared_ptr = nullptr
, i.e. no shared state, which is probably the state it is currently in after having been moved from. This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doung.Sorry, but I don't buy your argument.
Saying that a Buffer object is never invalid - except when it is invalid because it was moved from - so there is no need to check for its validity, is like saying that a pointer is never null - except when the user stored a null value - so there is no need to check that it is non-null.
Either a type guarantees that an object is always in a valid state, or the user needs to "know" when it is valid or not to use it.
Knowing it may come from an explicit check, or from the context.
In the case of many Alpaka objects, the type does not guarantee that the objects are always valid. Saying "ok, but they are not valid only after a move" is not different from saying "ok, but they are not valid only after being default-constructed". Once the possibility is there, the user much keep track of it.
1. The use of `std::optional` makes the code more explicit and self-documenting: In places where you use `std::optional<Buffer>` it is clear, that the buffer may be invalid, where `Buffer` is used, it cannot be invalid.
We have already determined that "it [a Buffer] cannot be invalid" cannot be guaranteed by the type itself, only by the context. So this point does not really apply.
Many alpaka objects are implicitly shared (the thing the user code interacts with is only contains an
std::shared_ptr
to the actual shared object). An invalid state could be one of two things:1. The `shared_ptr = nullptr`, i.e. no shared state, which is probably the state it is currently in after having been moved from.
This is my guess as well.
This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doing.
It would be exactly what the STL is doing for types like shared_ptr
or unique_ptr
- with the only difference that those types have a way to check for validity other than looking at an internal data member.
2. An empty object, i.e. an existing shared state which is an empty object. This would add overhead, as, for example, a new shared object would need to be created after move.
Indeed, I agree that this would not make much sense.
So, to me this boils down to one of
My suggestion would be to go with 2., and add an explicit "invalid state" constructor that a user may opt-in to use. If they do, it's on them to make sure an invalid object is never used.
Either a type guarantees that an object is always in a valid state, or the user needs to "know" when it is valid or not to use it.
That would be ideal. With the move, this is more of a design pattern of not moving from an object which lives on. So, I think there is a difference between this clutch and having a default constructor which can explicitly create invalid objects. You do point out a gap in alpaka's design here though. Maybe move semantics is just inherently incompatible with implicit sharing, I never thought about this.
Knowing it may come from an explicit check, or from the context.
In the case of many Alpaka objects, the type does not guarantee that the objects are always valid. [...]
One would need to stick to not moving lvalue alpaka objects for this guarantee to be effectively there. Yes, such a contract is not good design in C++, because the compiler cannot check it.
Many alpaka objects are implicitly shared (the thing the user code interacts with is only contains an
std::shared_ptr
to the actual shared object). An invalid state could be one of two things:1. The `shared_ptr = nullptr`, i.e. no shared state, which is probably the state it is currently in after having been moved from.
This is my guess as well.
This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doing.
It would be exactly what the STL is doing for types like
shared_ptr
orunique_ptr
- with the only difference that those types have a way to check for validity other than looking at an internal data member.
Yes: Because this is exactly what is happening inside the object (the move ctor/assigment is implicitly defaulted, so this is just the move of an std::shared_ptr
).
But actually no, because the shared instance is not part of the interface (should not be, there are some issue around this). An alpaka buffer, for example, is more akin to an std::vector
, which becomes empty when moved from.
So, to me this boils down to one of
1. enforcing that an Event or Queue or Buffer is _always_ valid: either by disallowing moving them (incurs in a small cost due to the increase/decrease of reference counting with each copy) or by coming up with a "valid state" they should have after being moved-from;
Disallowing move would indeed make the implicit sharing more consistent (the implicit sharing is unfortunately not cleanly implemented in some places, e.g. some types do not keep the shared state private and it get used as part of the interface...).
2. acknowledging that an Event or Queue or Buffer _may_ be in an invalid state under some conditions; one existing condition is "after being moved from".
This would make these types explicitly shared, i.e. the user needs to check if there is a shared state, turning them into glorfied std:.shared_ptr
s.
My suggestion would be to go with 2., and add an explicit "invalid state" constructor that a user may opt-in to use. If they do, it's on them to make sure an invalid object is never used.
I prefer 1 a bit, because it keeps the current main semantics. Disallowing move, may also not be too inefficient, because in many places where a move would legitimately be applied to a variable which goes out of scope immediately after, copy elision comes up.
I talked with Andrea a few weeks ago about this issue. One possible solution that he proposed, which would keep alpaka's programming pattern unchanged, was:
alpaka::Buf::invalid_state()
which creates and returns an invalid buffer (or a free function like alpaka::make_invalid<Buf>()
;alpaka::Buf::is_valid()
or a free function alpaka::is_valid<Buf>(Buf const&)
to check if a buffer (or an object, more general) is valid.With this solution:
Any comment/idea about this proposal?
Here are some thoughts of mine:
There are typically two approaches to obtain two phase construction for an object, which is what @fwyzard needs in his example.
Firstly, the type of the object supports a partially formed state (either via default construction, a special constructor or a special static factory member function) and construction can be finished using a member function. This has the advantage that you might be able to implement this more efficiently, since you can make use of sentinal values to represent the partially formed state (e.g. 'nullptr' for an embedded pointer). It has the disadvantage that the use of such objects gets more complicated because it can be in more states. Users might need to check whether an object is valid or not. Furthermore, all embedded objects now also need to support that partially formed state. There is also the danger that multiple types might implement different interfaces to check for partial formedness. Some people also argue that such checking is not needed: https://www.kdab.com/stepanov-regularity-partially-formed-objects-vs-c-value-types/.
It is also worthwhile to point out that a few types in the C++ standard library go that way, e.g. std::thread
or std::fstream
.
Secondly, you can wrap the object in another object that can delay its construction (e.g. std::optional
). This has the advantage that you immediately see, that an object can be in a state where it is not constructed yet. You can also not wrap an object and you immediately know that it must always be in a valid state. And a type such as 'std::optional' has a well known and consistent interface that is always the same, independently of what type it is wrapping.
My recommendation is thus to use std::optional
unless there is a very strong performance reason to not use it. I have also seen people write a variant of std::optional
without the additional bool
, so the user needs to know when such an object is valid or not.
It does not solve the bool try_pop( value_type& value );
from tbb::concurrent_queue
though, but I also wonder what kind of alpaka objects you would want to pass through a concurrent queue.
But if every object needs to be declared as optional to be used, maybe the current API is not a good match for our use case :-(
You only need to wrap your objects in std::optional
when you explicitely need two phase construction. There is no design error in alpaka which follows RAII, not allowing two phase construction.
In fact, Alpaka objects do have an "empty, invalid state" - the "moved from" state: ... There is no compile time error for "use after move"...
All objects have this state. And static analysis catchs uses of objects in such a state (e.g. clang-tidy bugprone-use-after-move). Users should never need to check whether an object is in a moved-from state. Alpaka objects can thus be assumed to always be valid.
Saying "ok, but they are not valid only after a move" is not different from saying "ok, but they are not valid only after being default-constructed"
It is different, because objects should not be used after being moved from. However, objects should be safe to be used after being default constructed, if such a construction is possible.
One possible solution that he proposed, which would keep alpaka's programming pattern unchanged, was:
- no addition of a default constructor for invalid objects;
- addition of a function like
alpaka::Buf::invalid_state()
which creates and returns an invalid buffer (or a free function likealpaka::make_invalid<Buf>()
;- addition of a method like
alpaka::Buf::is_valid()
or a free functionalpaka::is_valid<Buf>(Buf const&)
to check if a buffer (or an object, more general) is valid.
I would rather have a default constructor that creates an invalid object. It would make alpaka objects semi regular, which is desirable for the STL:
std::vector<alpaka::Whatever> v(10); // default constructs 10 Whatevers
v.resize(20); // default constructs 10 more
instead of:
std::vector<alpaka::Whatever> v(10); // compilation error
v.resize(20); // compilation error
std::vector<alpaka::Whatever> v(10, alpaka::Whatever::invalid_state()); // copy constructs 10 Whatevers
v.resize(20, alpaka::Whatever::invalid_state()); // copy constructs 10 more
@fwyzard what prevents you from using std::optional
? In the example you have given at the very top this seems like an easy solution.
It may also be worth noting that the Microsoft standard library developers recommend using std::optional
for the use case presented in the initial issue comment: Source.
objects should not be used after being moved from. However, objects should be safe to be used after being default constructed, if such a construction is possible.
Well, this is not an argument, it's an arbitrary assertion. I could write
objects should not be used after being default constructed. However, objects should be safe to be used after being moved from.
and it would have exactly the same validity. Case in point, many of the standard objects cannot be safely used after being default constructed.
@fwyzard what prevents you from using
std::optional
? In the example you have given at the very top this seems like an easy solution.
As I have pointed out already, having to wrap every single alpaka object in an std::optional
IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.
I cannot say out of hand if this can cover all our use case. Hopefully we will be able to come to the conclusion if using std::optional
everywhere is acceptable or not before we start using alpaka in production.
It may also be worth noting that the Microsoft standard library developers recommend using
std::optional
for the use case presented in the initial issue comment: Source.
Funny that they quote "C++’s zero-overhead abstractions" as a reason for using std::optional<T>
, considering that std::optional<T>
does have a small overhead, while making alpaka objects default constructible would not.
Well, this is not an argument, it's an arbitrary assertion.
It's not an argument in a scientific sense, but it's not arbitrary either. It's design consensus within the C++ community. I cannot provide you with a quote. I can only share experience I obtained by listening to the experts.
As I have pointed out already, having to wrap every single alpaka object in an std::optional IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.
This is not required in cases where two phase construction is not needed.
Hopefully we will be able to come to the conclusion if using std::optional everywhere is acceptable or not before we start using alpaka in production.
I am looking forward to your feedback and measurements there!
Funny that they quote "C++’s zero-overhead abstractions" as a reason for using std::optional
, considering that std::optional does have a small overhead, while making alpaka objects default constructible would not.
std::optional<T>
is a generic, zero-overhead abstraction. You cannot do better if you don't know anything about T
.
Well, this is not an argument, it's an arbitrary assertion.
It's not an argument in a scientific sense, but it's not arbitrary either. It's design consensus within the C++ community. I cannot provide you with a quote. I can only share experience I obtained by listening to the experts.
We clearly disagree on this point, I would argue that the C++ standard disagrees with you, as well.
Try using a default constructed std::shared_ptr
, std::unique_ptr
, or std::optional
for that matters.
Or please explain what kind of problems you incur in using a moved-from std::vector
or other container.
As I have pointed out already, having to wrap every single alpaka object in an std::optional IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.
This is not required in cases where two phase construction is not needed.
Then we agree on this: the API currently provided by alpaka is well suited for simple cases, but not flexible enough for more complex ones.
std::optional<T>
is a generic, zero-overhead abstraction. You cannot do better if you don't know anything aboutT
.
Sure. But I'm not arguing about generic types, I'm arguing about alpaka objects that are implemented as an std::shared_ptr<X>
. And for these you can do better than std::optional
.
Just seeing the discussion. I think we should step back and discuss example use cases in Alpaka for this and evaluate solutions.
I think we brought forward our arguments. So @tonydp03 or @j-stephan I let you decide what to do.
@fwyzard we discussed this issue today during the alpaka VC and we wondered whether a buffer with zero extents could serve your use case:
struct A {
AlpakaDev dev;
AlpakaBuf buf;
A(AlpakaDev dev) : dev(dev), buf(alpaka::allocBuf<float>(dev, alpaka::Vec<1, size_t>::zeros())) {}
void alloc(size_t size) {
buf = alpaka::allocBuf<float>(dev, alpaka::Vec<1, size_t>{size});
}
};
Such an allocation with zero extents (not zero dimensionality!) for the CUDA backend does not call any API functions during allocation and also copy. We would still need to check whether that is true for the other backends though.
Would this work for your use case?
@bernhardmgruber this workaround could be useful for buffers, but CMS needs something similar for queues and events...
Alpaka objects do not have an "empty, invalid state", which prevents their direct use in various situations.
lifetime different than the containing object
One such case is holding a memory buffer that does not need to live as long as the holding object, but still longer than individual methods:
(this is just a mock-up example I made up on the fly to illustrate the argument; apologies for any errors or inconsistencies)
reference to output parameters
An other use cases are APIs that will return a value by actually taking an output parameter by reference. One example we have run into is TBB's [
concurrent_queue::try_pop(https://spec.oneapi.io/versions/latest/elements/oneTBB/source/containers/concurrent_queue_cls/safe_member_functions.html#popping-elements)
]() (but I've see the same pattern in other parallel libraries), that takes an argument by reference and assigns the popped value (if any) to it:We are storing
alpaka::Event
s in aconcurrent_queue
in order to reuse them (creating CUDA events is expensive, while reusing them is cheaper), and we would like to do something like(again, this is a very simplified mock-up)
As has been suggested elsewhere, it should be possible to address these use cases by wrapping (almost) all Alpaka objects in an
std::optional
. But if every object needs to be declared as optional to be used, maybe the current API is not a good match for our use case :-(