alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
349 stars 72 forks source link

Alpaka objects do not have an "empty, invalid state" #1445

Open fwyzard opened 2 years ago

fwyzard commented 2 years ago

Alpaka objects do not have an "empty, invalid state", which prevents their direct use in various situations.


lifetime different than the containing object

One such case is holding a memory buffer that does not need to live as long as the holding object, but still longer than individual methods:

//  the definition or templating on Device, Queue, Acc, etc. are left out for clarity
class Producer {
public:
  Producer() : ...  // Note: the buffers cannot be initialised here, as their size is not known yet
  {
  }

  // method called by the framework to begin an asynchronous operation 
  void async(Queue const& queue, Input const& input) {

    // allocate host and device buffers
    // Note: the buffers cannot be declared here, because they need to live longer that this function scope
    auto input_size = input.size();
    auto output_size = estimate_size(input_size);
    h_input_ = input.buffer();
    d_input_ = alpaka::allocBuf<float>(device_, input_size);
    h_output_ = alpaka::allocBuf<float>(host_, output_size);
    d_input_ = alpaka::allocBuf<float>(device_, output_size);

    // async copy the input to the device
    alpaka::memcpy(queue, d_input_, h_input_, input_size);

    // launch a kernel working on device data
    auto workDiv = ...;  // e.g. based on input_size, the device properties, etc.
    alpaka::enqueue(queue, alpaka::createTaskKernel<Acc>(workDiv, kernel, alpaka::getPtrNative(d_input_), input_size, alpaka::getPtrNative(d_output_), output_size);

    // async copy the output from the device
    alpaka::memcpy(queue, h_output_, d_output_, output_size);

    // do not wait for the copy and kernel to be done
  }

  // method called by the framework only after the asynchronous operation has complete
  void then(Output & output) {
    // release the handle to the input host buffer
    h_input_.reset();  // <-- currently not possible

    // free the device buffers
    d_input_.reset();   // <-- currently not possible
    d_output_.reset();  // <-- currently not possible

    // move the result to the output data structure
    output.buffer = std::move(h_output_));
  }

private:
  alpaka::DevCpu host_;
  Device device_;

  HostBuffer<float> h_input_;
  HostBuffer<float> h_output_;
  DeviceBuffer<float> d_input_;
  DeviceBuffer<float> d_output_;
};

(this is just a mock-up example I made up on the fly to illustrate the argument; apologies for any errors or inconsistencies)


reference to output parameters

An other use cases are APIs that will return a value by actually taking an output parameter by reference. One example we have run into is TBB's [concurrent_queue::try_pop(https://spec.oneapi.io/versions/latest/elements/oneTBB/source/containers/concurrent_queue_cls/safe_member_functions.html#popping-elements)]() (but I've see the same pattern in other parallel libraries), that takes an argument by reference and assigns the popped value (if any) to it:

bool try_pop( value_type& value );

We are storing alpaka::Events in a concurrent_queue in order to reuse them (creating CUDA events is expensive, while reusing them is cheaper), and we would like to do something like

tbb::concurrent_queue<TEvent> events_;
...

TEvent get_event() {
  TEvent event;  // empty event
  if (events_.try_pop(event)) {
    // use the popped event
  } else {
    // create a new event
    event = ...;
  }
  return event;
}

(again, this is a very simplified mock-up)


As has been suggested elsewhere, it should be possible to address these use cases by wrapping (almost) all Alpaka objects in an std::optional. But if every object needs to be declared as optional to be used, maybe the current API is not a good match for our use case :-(

fwyzard commented 2 years ago

Thinking about it:

fwyzard commented 2 years ago

In fact, Alpaka objects do have an "empty, invalid state" - the "moved from" state:

#include <iostream>
#include <alpaka/alpaka.hpp>

int main(void) {
  using Idx = uint32_t;
  using Host = alpaka::DevCpu;
  using HostPlatform = alpaka::Pltf<Host>;

  const auto host = alpaka::getDevByIdx<HostPlatform>(0u);

  // construct a buffer and allocate the corresponding memory
  auto b1 = alpaka::allocBuf<char, Idx>(host, 42u);
  std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;

  // move the memory to a different buffer, leaving the original buffer in an invalid state
  auto b2 = std::move(b1);
  std::cerr << "b1 is invalid: " << std::endl;
  std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;

  return 0;
}

So, a workaround could simply be keep around one invalid object per type...

  // construct a buffer in an invalid state
  auto invalid_buffer = b1;
  std::cerr << "invalid_buffer is invalid: " << std::endl;
  std::cerr << (void*) alpaka::getPtrNative(invalid_buffer) << std::endl;

Clearly, not the most elegant solution :-D

jkelling commented 2 years ago

Not having an "invalid" state is part of the RAII (resource acquisition is initialization) pattern prevalent in alpaka. The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).

The advantage of not having an invalid state is, that one does not need to check for it before using an object at runtime. Anything that would lead use of an object failing becomes a compile-time error instead.

You should consider, that there is a tradeoff: Either you use std::optional where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.

fwyzard commented 2 years ago

The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).

Er, no.

Moved-from objects should be in a "valid but unspecified state", not in an invalid state.

For example, an std::vector could be empty, or an std::unique_ptr could point to nullptr - but they should still be usable, and in fact they are:

#include <iostream>
#include <memory>
#include <vector>

int main(void) {
  std::vector<int> v = { 42, 0, 9, -4, 2, 27, -99 };

  std::cout << "size before move: " << v.size() << std::endl;
  auto w = std::move(v);
  std::cout << "size after move:  " << v.size() << std::endl;

  std::unique_ptr<int> p = std::make_unique<int>(42);

  std::cout << "address before move: " << p.get() << std::endl;
  auto q = std::move(p);
  std::cout << "address after move:  " << p.get() << std::endl;

  return 0;
}

Accessing a non-existent element of the vector or dereferencing the nullptr-values unique_ptr would lead to a crash at runtime (or an exception if using v.at()), but that would be a bug on the user side, which could be prevented by checking the status of the object.

Anything that would lead use of an object failing becomes a compile-time error instead.

As my examples clearly demonstrate, this is not true.

There is no compile time error for "use after move" - neither in the case of STL objects (where using them after moving-from is fine, as long as the status of the objects is checked) or of the Alpaka objects (where trying to read the pointer of a moved-from buffer leads to a segmentation fault, instead of returning nullptr).

Either you use std::optional where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.

There is a third possibility which you are not considering: the code that uses an object may be fully aware of when an object is safe to use or not due to the value of other variables, without the need of the overhead from using std::optional.

jkelling commented 2 years ago

The "invalid" state after a move is not really such: one should not use an object after it has been moved from, because its state is more undefined than invalid (although I do not know if alpaka's docs might give some guarantees about that stated, making it defined.).

Er, no.

Moved-from objects should be in a "valid but unspecified state", not in an invalid state.

That is a guarantee that (at least most) STL types give. I am not aware, that this is a requirement of the core language for move semantics. The only thing an object must guarantee after being moved from is that its destructor does not invalidate anything that was moved away. It may, for example, be no longer assignable, although I agree, that the guidelines the STL is following here are better.

Accessing a non-existent element of the vector or dereferencing the nullptr-values unique_ptr would lead to a crash at runtime (or an exception if using v.at()), but that would be a bug on the user side, which could be prevented by checking the status of the object.

Anything that would lead use of an object failing becomes a compile-time error instead.

As my examples clearly demonstrate, this is not true.

There is indeed this case, where you move away and then access it, which would lead to a runtime-error. What I was referring to is, that moving away and then accessing is often considered an anti-pattern and would at least raise some careful looks during code review. Any other use of an alpaka object is valid due to the non-existent invalid state. In fact, since alpaka types do not have an invalid state and one cannot really determine whether it is valid, there is not save way to use an alpaka object that may have been moved from.

There is no compile time error for "use after move" - neither in the case of STL objects (where using them after moving-from is fine, as long as the status of the objects is checked) or of the Alpaka objects (where trying to read the pointer of a moved-from buffer leads to a segmentation fault, instead of returning nullptr).

It is correct, that there is not compile time error for use-after-move, but currently, this is the only situation in which you can end up with an invalid alpaka object, limiting the opportunities for runtime errors.

Either you use std::optional where you need it (like in the situations your listed above) or your have to add checks everywhere else, too.

There is a third possibility which you are not considering: the code that uses an object may be fully aware of when an object is safe to use or not due to the value of other variables, without the need of the overhead from using std::optional.

  1. The use of std::optional makes the code more explicit and self-documenting: In places where you use std::optional<Buffer> it is clear, that the buffer may be invalid, where Buffer is used, it cannot be invalid.
  2. Many alpaka objects are implicitly shared (the thing the user code interacts with is only contains an std::shared_ptr to the actual shared object). An invalid state could be one of two things:
    1. The shared_ptr = nullptr, i.e. no shared state, which is probably the state it is currently in after having been moved from. This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doung.
    2. An empty object, i.e. an existing shared state which is an empty object. This would add overhead, as, for example, a new shared object would need to be created after move.
fwyzard commented 2 years ago

Sorry, but I don't buy your argument.

Saying that a Buffer object is never invalid - except when it is invalid because it was moved from - so there is no need to check for its validity, is like saying that a pointer is never null - except when the user stored a null value - so there is no need to check that it is non-null.

Either a type guarantees that an object is always in a valid state, or the user needs to "know" when it is valid or not to use it.

Knowing it may come from an explicit check, or from the context.

In the case of many Alpaka objects, the type does not guarantee that the objects are always valid. Saying "ok, but they are not valid only after a move" is not different from saying "ok, but they are not valid only after being default-constructed". Once the possibility is there, the user much keep track of it.

1. The use of `std::optional` makes the code more explicit and self-documenting: In places where you use `std::optional<Buffer>` it is clear, that the buffer may be invalid, where `Buffer` is used, it cannot be invalid.

We have already determined that "it [a Buffer] cannot be invalid" cannot be guaranteed by the type itself, only by the context. So this point does not really apply.

Many alpaka objects are implicitly shared (the thing the user code interacts with is only contains an std::shared_ptr to the actual shared object). An invalid state could be one of two things:

1. The `shared_ptr = nullptr`, i.e. no shared state, which is probably the state it is currently in after having been moved from.

This is my guess as well.

This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doing.

It would be exactly what the STL is doing for types like shared_ptr or unique_ptr - with the only difference that those types have a way to check for validity other than looking at an internal data member.

2. An empty object, i.e. an existing shared state which is an empty object. This would add overhead, as, for example, a new shared object would need to be created after move.

Indeed, I agree that this would not make much sense.

fwyzard commented 2 years ago

So, to me this boils down to one of

  1. enforcing that an Event or Queue or Buffer is always valid: either by disallowing moving them (incurs in a small cost due to the increase/decrease of reference counting with each copy) or by coming up with a "valid state" they should have after being moved-from;
  2. acknowledging that an Event or Queue or Buffer may be in an invalid state under some conditions; one existing condition is "after being moved from".

My suggestion would be to go with 2., and add an explicit "invalid state" constructor that a user may opt-in to use. If they do, it's on them to make sure an invalid object is never used.

jkelling commented 2 years ago

Either a type guarantees that an object is always in a valid state, or the user needs to "know" when it is valid or not to use it.

That would be ideal. With the move, this is more of a design pattern of not moving from an object which lives on. So, I think there is a difference between this clutch and having a default constructor which can explicitly create invalid objects. You do point out a gap in alpaka's design here though. Maybe move semantics is just inherently incompatible with implicit sharing, I never thought about this.

Knowing it may come from an explicit check, or from the context.

In the case of many Alpaka objects, the type does not guarantee that the objects are always valid. [...]

One would need to stick to not moving lvalue alpaka objects for this guarantee to be effectively there. Yes, such a contract is not good design in C++, because the compiler cannot check it.

Many alpaka objects are implicitly shared (the thing the user code interacts with is only contains an std::shared_ptr to the actual shared object). An invalid state could be one of two things:

1. The `shared_ptr = nullptr`, i.e. no shared state, which is probably the state it is currently in after having been moved from.

This is my guess as well.

This would have no intrinsic overhead, apart some more check which one may leave out, but would not be like what the STL is doing.

It would be exactly what the STL is doing for types like shared_ptr or unique_ptr - with the only difference that those types have a way to check for validity other than looking at an internal data member.

Yes: Because this is exactly what is happening inside the object (the move ctor/assigment is implicitly defaulted, so this is just the move of an std::shared_ptr).

But actually no, because the shared instance is not part of the interface (should not be, there are some issue around this). An alpaka buffer, for example, is more akin to an std::vector, which becomes empty when moved from.

jkelling commented 2 years ago

So, to me this boils down to one of

1. enforcing that an Event or Queue or Buffer is _always_ valid: either by disallowing moving them (incurs in a small cost due to the increase/decrease of reference counting with each copy) or by coming up with a "valid state" they should have after being moved-from;

Disallowing move would indeed make the implicit sharing more consistent (the implicit sharing is unfortunately not cleanly implemented in some places, e.g. some types do not keep the shared state private and it get used as part of the interface...).

2. acknowledging that an Event or Queue or Buffer _may_ be in an invalid state under some conditions; one existing condition is "after being moved from".

This would make these types explicitly shared, i.e. the user needs to check if there is a shared state, turning them into glorfied std:.shared_ptrs.

My suggestion would be to go with 2., and add an explicit "invalid state" constructor that a user may opt-in to use. If they do, it's on them to make sure an invalid object is never used.

I prefer 1 a bit, because it keeps the current main semantics. Disallowing move, may also not be too inefficient, because in many places where a move would legitimately be applied to a variable which goes out of scope immediately after, copy elision comes up.

tonydp03 commented 2 years ago

I talked with Andrea a few weeks ago about this issue. One possible solution that he proposed, which would keep alpaka's programming pattern unchanged, was:

With this solution:

Any comment/idea about this proposal?

bernhardmgruber commented 2 years ago

Here are some thoughts of mine:

There are typically two approaches to obtain two phase construction for an object, which is what @fwyzard needs in his example.

Firstly, the type of the object supports a partially formed state (either via default construction, a special constructor or a special static factory member function) and construction can be finished using a member function. This has the advantage that you might be able to implement this more efficiently, since you can make use of sentinal values to represent the partially formed state (e.g. 'nullptr' for an embedded pointer). It has the disadvantage that the use of such objects gets more complicated because it can be in more states. Users might need to check whether an object is valid or not. Furthermore, all embedded objects now also need to support that partially formed state. There is also the danger that multiple types might implement different interfaces to check for partial formedness. Some people also argue that such checking is not needed: https://www.kdab.com/stepanov-regularity-partially-formed-objects-vs-c-value-types/. It is also worthwhile to point out that a few types in the C++ standard library go that way, e.g. std::thread or std::fstream.

Secondly, you can wrap the object in another object that can delay its construction (e.g. std::optional). This has the advantage that you immediately see, that an object can be in a state where it is not constructed yet. You can also not wrap an object and you immediately know that it must always be in a valid state. And a type such as 'std::optional' has a well known and consistent interface that is always the same, independently of what type it is wrapping.

My recommendation is thus to use std::optional unless there is a very strong performance reason to not use it. I have also seen people write a variant of std::optional without the additional bool, so the user needs to know when such an object is valid or not. It does not solve the bool try_pop( value_type& value ); from tbb::concurrent_queue though, but I also wonder what kind of alpaka objects you would want to pass through a concurrent queue.

But if every object needs to be declared as optional to be used, maybe the current API is not a good match for our use case :-(

You only need to wrap your objects in std::optional when you explicitely need two phase construction. There is no design error in alpaka which follows RAII, not allowing two phase construction.

In fact, Alpaka objects do have an "empty, invalid state" - the "moved from" state: ... There is no compile time error for "use after move"...

All objects have this state. And static analysis catchs uses of objects in such a state (e.g. clang-tidy bugprone-use-after-move). Users should never need to check whether an object is in a moved-from state. Alpaka objects can thus be assumed to always be valid.

Saying "ok, but they are not valid only after a move" is not different from saying "ok, but they are not valid only after being default-constructed"

It is different, because objects should not be used after being moved from. However, objects should be safe to be used after being default constructed, if such a construction is possible.

One possible solution that he proposed, which would keep alpaka's programming pattern unchanged, was:

  • no addition of a default constructor for invalid objects;
  • addition of a function like alpaka::Buf::invalid_state() which creates and returns an invalid buffer (or a free function like alpaka::make_invalid<Buf>();
  • addition of a method like alpaka::Buf::is_valid() or a free function alpaka::is_valid<Buf>(Buf const&) to check if a buffer (or an object, more general) is valid.

I would rather have a default constructor that creates an invalid object. It would make alpaka objects semi regular, which is desirable for the STL:

std::vector<alpaka::Whatever> v(10); // default constructs 10 Whatevers
v.resize(20); // default constructs 10 more

instead of:

std::vector<alpaka::Whatever> v(10); // compilation error
v.resize(20);  // compilation error

std::vector<alpaka::Whatever> v(10, alpaka::Whatever::invalid_state()); // copy constructs 10 Whatevers
v.resize(20, alpaka::Whatever::invalid_state()); // copy constructs 10 more

@fwyzard what prevents you from using std::optional? In the example you have given at the very top this seems like an easy solution.

j-stephan commented 2 years ago

It may also be worth noting that the Microsoft standard library developers recommend using std::optional for the use case presented in the initial issue comment: Source.

fwyzard commented 2 years ago

objects should not be used after being moved from. However, objects should be safe to be used after being default constructed, if such a construction is possible.

Well, this is not an argument, it's an arbitrary assertion. I could write

objects should not be used after being default constructed. However, objects should be safe to be used after being moved from.

and it would have exactly the same validity. Case in point, many of the standard objects cannot be safely used after being default constructed.

@fwyzard what prevents you from using std::optional? In the example you have given at the very top this seems like an easy solution.

As I have pointed out already, having to wrap every single alpaka object in an std::optional IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.

I cannot say out of hand if this can cover all our use case. Hopefully we will be able to come to the conclusion if using std::optional everywhere is acceptable or not before we start using alpaka in production.

fwyzard commented 2 years ago

It may also be worth noting that the Microsoft standard library developers recommend using std::optional for the use case presented in the initial issue comment: Source.

Funny that they quote "C++’s zero-overhead abstractions" as a reason for using std::optional<T>, considering that std::optional<T> does have a small overhead, while making alpaka objects default constructible would not.

bernhardmgruber commented 2 years ago

Well, this is not an argument, it's an arbitrary assertion.

It's not an argument in a scientific sense, but it's not arbitrary either. It's design consensus within the C++ community. I cannot provide you with a quote. I can only share experience I obtained by listening to the experts.

As I have pointed out already, having to wrap every single alpaka object in an std::optional IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.

This is not required in cases where two phase construction is not needed.

Hopefully we will be able to come to the conclusion if using std::optional everywhere is acceptable or not before we start using alpaka in production.

I am looking forward to your feedback and measurements there!

Funny that they quote "C++’s zero-overhead abstractions" as a reason for using std::optional, considering that std::optional does have a small overhead, while making alpaka objects default constructible would not.

std::optional<T> is a generic, zero-overhead abstraction. You cannot do better if you don't know anything about T.

fwyzard commented 2 years ago

Well, this is not an argument, it's an arbitrary assertion.

It's not an argument in a scientific sense, but it's not arbitrary either. It's design consensus within the C++ community. I cannot provide you with a quote. I can only share experience I obtained by listening to the experts.

We clearly disagree on this point, I would argue that the C++ standard disagrees with you, as well. Try using a default constructed std::shared_ptr, std::unique_ptr, or std::optional for that matters. Or please explain what kind of problems you incur in using a moved-from std::vector or other container.

As I have pointed out already, having to wrap every single alpaka object in an std::optional IMHO is a clear message that the API is lacking, at least for the more advanced use cases that we have.

This is not required in cases where two phase construction is not needed.

Then we agree on this: the API currently provided by alpaka is well suited for simple cases, but not flexible enough for more complex ones.

std::optional<T> is a generic, zero-overhead abstraction. You cannot do better if you don't know anything about T.

Sure. But I'm not arguing about generic types, I'm arguing about alpaka objects that are implemented as an std::shared_ptr<X>. And for these you can do better than std::optional.

bussmann commented 2 years ago

Just seeing the discussion. I think we should step back and discuss example use cases in Alpaka for this and evaluate solutions.

bernhardmgruber commented 2 years ago

I think we brought forward our arguments. So @tonydp03 or @j-stephan I let you decide what to do.

bernhardmgruber commented 2 years ago

@fwyzard we discussed this issue today during the alpaka VC and we wondered whether a buffer with zero extents could serve your use case:

struct A {
  AlpakaDev dev;
  AlpakaBuf buf;

  A(AlpakaDev dev) : dev(dev), buf(alpaka::allocBuf<float>(dev, alpaka::Vec<1, size_t>::zeros())) {}

  void alloc(size_t size) {
    buf = alpaka::allocBuf<float>(dev, alpaka::Vec<1, size_t>{size});
  }
};

Such an allocation with zero extents (not zero dimensionality!) for the CUDA backend does not call any API functions during allocation and also copy. We would still need to check whether that is true for the other backends though.

Would this work for your use case?

tonydp03 commented 2 years ago

@bernhardmgruber this workaround could be useful for buffers, but CMS needs something similar for queues and events...