Squadrick / shadesmar

Fast C++ IPC using shared memory
MIT License
550 stars 84 forks source link

Support zero-copy communication #43

Open Squadrick opened 4 years ago

Squadrick commented 4 years ago

Here's one way to achieve this:

Publisher p("topic");
void *ptr = p.get_msg(size);

ptr is allocated in the shared memory (using Allocator) and given to the user. We also assign an Element in the shared queue to ptr. We hold a writer lock on this element until ptr is published. We may need to update the base Element to add an extra field: is_zero_copied, so that the consumer can react accordingly.

auto obj = new (ptr) SomeClass( /* params */);
// update obj
p.zero_copy_publish(ptr); // releases the shared queue element lock

On the consumer side, we'll return a subclass of Memblock: ZeroCopyMemblock which will not have a no_delete() and will be deallocated at the end of the callback. We'll need to check the logic for locks as well.

Code path for copied-communication:

// element is the currently accessed shared queue position
Memblock memblock;
element.lock.acquire();
memcpy(element.ptr, memblock.ptr, element.size);
memblock.size = element.size;
element.lock.release();

callback(memblock);

if (memblock.should_free) {
   delete memblock;
}

New code path for zero-copy communication:

element.lock.acquire();
callback(ZeroCopyMemblock{element.ptr, element.size});
element.lock.release();

allocator.dealloc(element);

The above has been shown for pub-sub, but they can be extended to RPC too.


Here's a problem, we can't free each message pointer independently. A message pointer can only be free after all preceding message allocations are released, which is due to the logic in which Allocator works. It is a strictly FIFO-based allocation strategy. For performance, we may want to consider moving to a more complex general-purpose allocator.

NOTE: Writing a general-purpose allocator to work on a single chunk of shared memory is very error-prone.

hugosenari commented 11 months ago

I discovered your lib because my thought was, if I cannot manage memory across process (without writing a custom memory allocator) what can I do to go faster than memcopy.

Andrei gives a good hint on how to write allocators in std::allocator Is to Allocation what std::vector Is to Vexation, his talk also fixed something that was nebulous in my mind: multiple strategies for multiple sizes.

So currently, I'm trying to find how large a memory block has to be, to zero copy makes sense.

And, how iceoryx does it?

flyonaie commented 4 months ago

"iceoryx" is a pre-allocated memory space.