alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
349 stars 72 forks source link

Default contstructor for alpaka Buffer #1426

Open SimeonEhrig opened 2 years ago

SimeonEhrig commented 2 years ago

At the moment, alpaka::Buf does not provide a default constructor. Therefore I have to do the following thing, If I want to declare alpaka::Buf object as member of my class:

template< /* ... */>
class Foo {
  using Vec = alpaka::Vec<TDim, TIdx>;
  using BufHost = alpaka::Buf<Host, TData, TDim, TIdx>;
  using BufDev = alpaka::Buf<Acc, TData, TDim, TIdx>;

  Vec m_extent;

  BufHost m_host_men;
  BufDev m_device_mem;

  Vec calculate_extends(std::uint64_t const size){
    Vec extent = Vec::all(static_cast<TIdx>(1));
    extent[TDim::value - 1u] = size;
    return extent;
  }

public:
  Foo(int const memSize) : m_extent(calculate_extends(memSize)),
             m_host_men(alpaka::allocBuf<TData, TIdx>(P::devHost, m_extent)),
             m_device_mem(alpaka::allocBuf<TData, TIdx>(P::devAcc, m_extent))
    {}
};

Technically it is fine, but it feels uncommon. I would prefer, if this is possible:

Foo(int const memSize) : m_extent(Vec::all(static_cast<TIdx>(1)))      
{
  extent[TDim::value - 1u] = memSize;
  m_host_men = std::move(alpaka::allocBuf<TData, TIdx>(P::devHost, m_extent));
  m_device_mem = std::move(alpaka::allocBuf<TData, TIdx>(P::devAcc, m_extent));
}

I think, this API looks better. But the default constructor brings also a problem. At the moment, each alpaka::Buf points to a chunk of memory. With the default constructor, this not required anymore and we to check this.

What is your opinion?

j-stephan commented 2 years ago

Maybe let the default constructor construct an empty buffer? In this case we should add an empty() check to the buffer interface.

SimeonEhrig commented 2 years ago

Maybe let the default constructor construct an empty buffer? In this case we should add an empty() check to the buffer interface.

In this case, we need a device object to allocate the empty buffer on the correct device.

BenjaminW3 commented 2 years ago

Default parameters would be hard to guess. The extent could be zero, but on which device? Selecting an arbitrary device could impose problems. Adding zombie states like "empty buffer" to objects is an anti-pattern and should be prevented as much as possible. Objects should always be in a valid and useful state. If your class has multiple states and initially does not have a buffer, then it is part of your class to express this and not of the buffer. The default go-to solution nowadays is using std::optional. Then you are not forced to initialize the buffer in the class-constructor but you have to handle the optionality directly within your class when accessing the buffer.

fwyzard commented 2 years ago

Adding zombie states like "empty buffer" to objects is an anti-pattern and should be prevented as much as possible. Objects should always be in a valid and useful state.

The current implementation of buffers already does not guarantee this; given that buffers are movable, a buffer that has been "moved from" is effectively in an "empty buffer" state, and trying to use it (other than assigning/moving to it) will result in a crash:

#include <iostream>
#include <alpaka/alpaka.hpp>

int main(void) {
  using Idx = uint32_t;
  using Host = alpaka::DevCpu;
  using HostPlatform = alpaka::Pltf<Host>;

  const auto host = alpaka::getDevByIdx<HostPlatform>(0u);

  auto b1 = alpaka::allocBuf<char, Idx>(host, 42u);
  std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;

  auto b2 = std::move(b1);
  // b1 is now in an invalid state
  std::cerr << (void*) alpaka::getPtrNative(b1) << std::endl;

  return 0;
}

results in

0x5579a79f5f80
Segmentation fault (core dumped)
bernhardmgruber commented 2 years ago

From the technical standpoint, since alpaka buffers are reference counted handles via a shared pointer to an internal implementation, we can get a very easy empty state: leaving the shared pointer null. That entails the problem that all buffer operations now acquire the precondition of this implementation pointer to not be null. We could assert that in debug builds, but for sure developers will run into operating on uninitialized buffers. So we would loose safety here.

Adding zombie states like "empty buffer" to objects is an anti-pattern and should be prevented as much as possible.

Not necessarily. One such state is the "moved from"/"partially formed" state, which is fine by C++'s design (but still causes troubles every now and then). But such states are only meaningful, if the object actually defines move and copy semantics. Alpaka buffer objects don't deep copy when they are copied, so there is no point in defining a move semantic, thus no point in having a "moved from" state.

If your class has multiple states and initially does not have a buffer, then it is part of your class to express this and not of the buffer. The default go-to solution nowadays is using std::optional.

Fully agree, although std::optional incurs an extra storage cost.

... given that buffers are movable, a buffer that has been "moved from" is effectively in an "empty buffer" state.

And that is a problem. We could disallow the compiler generated move ctor/assignment to avoid that problem and force a copy of the shared pointer at all times. Then however we would always pay the cost of copying a shared_ptr.

fwyzard commented 2 years ago

We could disallow the compiler generated move ctor/assignment to avoid that problem and force a copy of the shared pointer at all times. Then however we would always pay the cost of copying a shared_ptr.

No, please, we like the performance :-)

fwyzard commented 2 years ago

Default parameters would be hard to guess. The extent could be zero, but on which device? Selecting an arbitrary device could impose problems.

The default status could simply have an empty (nullptr) m_spBufCpuImpl, so it would not be on any device nor point to any actual buffer.

bernhardmgruber commented 2 years ago

The flipside design is then to inline the shared pointers in all alpaka objects and define proper move and copy operations. That would avoid all shared pointer costs (which I guess are very negligible since they are only paid on ctor/dtor and the pointees are expensive objects). You still have the moved-from state as now. We would need to forbid the copy operations then, which might be a usability impediment. Anyway, that is a huge design overhaul. And I guess @BenjaminW3 designed alpaka's API with reference counted objects for a reason.

fwyzard commented 2 years ago

The flipside design is then to inline the shared pointers in all alpaka objects and define proper move and copy operations.

What would it mean to "inline the shared pointers" ? If the reference counting were part of the object itself, it wouldn't be a shared pointer, because it wouldn't have anything to point too ?

bernhardmgruber commented 2 years ago

What would it mean to "inline the shared pointers" ? If the reference counting were part of the object itself, it wouldn't be a shared pointer, because it wouldn't have anything to point too ?

Moving the contents of BufXXXImpl into BufXXX and removing the shared pointer and then BufXXXImpl, leaving BufXXX no longer reference countable.

fwyzard commented 2 years ago

Moving the contents of BufXXXImpl into BufXXX and removing the shared pointer and then BufXXXImpl

So... basically using BufXXXImpl directly instead of BufXXX ?

leaving BufXXX no longer reference countable.

Do you mean "no longer reference counted" rather than "no longer reference countable" ?

bernhardmgruber commented 2 years ago

Moving the contents of BufXXXImpl into BufXXX and removing the shared pointer and then BufXXXImpl

So... basically using BufXXXImpl directly instead of BufXXX ?

Yes. Making BufXXXImpl the public interface.

leaving BufXXX no longer reference countable.

Do you mean "no longer reference counted" rather than "no longer reference countable" ?

Yes, BufXXXImpl just has the native handles just embedded directly. It does not have a reference counted shared state any longer.