FairRootGroup / FairMQ

C++ Message Queuing Library and Framework
GNU Lesser General Public License v3.0
86 stars 34 forks source link

(Unneeded?) Memory bloat in fair::mq::shmem::Message #477

Open ktf opened 1 year ago

ktf commented 1 year ago

While doing some profiling I just noticed that the fair::mq::shmem::Message has a fQueued boolean in the middle of it, introducing some unneeded extra padding. If I naively move it to the MetaHeader class, I immediately go from 96 bytes to 88.

I also suspect one could gain quite some extra memory with a few extra reasonable limitations. For example:

All in all, I think one could get down below 64 bytes by having something like the following:

  struct  __attribute__((__packed__)) Message : public fair::mq::Message // Extra 16 bytes from this
  {
    Manager* fManager;
    mutable UnmanagedRegion* fRegionPtr;
    mutable char* fLocalPtr;
    int16_t fRegionId;
    int16_t fSegmentId;
    size_t fSize: 48;
    size_t fHandle: 48;
    size_t fShared: 48;
    size_t fAlignment: 14;
    size_t fManaged: 1;
    size_t fQueued: 1;
  };

and it could probably be even less if one used handles for fManager and the the other pointers. Do you think there is any space to evolve the managed region to go into this direction?

rbx commented 1 year ago

The MetaHeader part contains data that will be transferred to a different process, while other members of shmem Message class are only valid/meaningful locally per process.

struct Message
{
    Manager& fManager; // local ptr to the manager
    bool fQueued; // is the message queued? (can the buffer removed when dtor is called).
    MetaHeader fMeta; // contains all the "transferrable" info
    size_t fAlignment; 
    mutable UnmanagedRegion* fRegionPtr; // used in case of unmanaged region messages
    mutable char* fLocalPtr; // local data ptr, for received messages obtained from the handle
}

struct MetaHeader
{
    size_t fSize;
    size_t fHint; // this is used for unmanaged region messages to store user provided info, which is returned when message can be freed. Afaik this is used at least by FLP/readout.
    boost::interprocess::managed_shared_memory::handle_t fHandle; // interprocess handle, convertable to local ptr
    mutable boost::interprocess::managed_shared_memory::handle_t fShared; // used for the ref count when msg is shared
    uint16_t fRegionId; // unmanaged region id
    mutable uint16_t fSegmentId; // segment id
    bool fManaged; // managed/unmanaged
};

Both should be minimal size, but I would say having the Metaheader part smaller is more important.

fQueued boolean in the middle of it, introducing some unneeded extra padding. If I naively move it to the MetaHeader class, I immediately go from 96 bytes to 88

Logically it belongs to local data, not shared data. For this and similar suggestions one could think to create the final MetaData header only immediately before transfer, then there would be more freedom to resize Message. I guess this line would have to be a lil bit more complicated then: https://github.com/FairRootGroup/FairMQ/blob/master/fairmq/shmem/Socket.h#L134

fAlignment at the moment is a size_t, while it could easily be a char storing log2 (assuming no one needs alignment which is not a power of 2).

Sounds good!

size and hint could be limited to 48 bits, leaving space for the above mentioned fAlignment, fQueued, fManaged to be stored there.

Would 48 be enough for current and future use cases? Hint is 64 bits as per requirement from FLP/readout.

it could probably be even less if one used handles for fManager and the the other pointers

Which handles do you mean? ptrs and boost interproccess handles are both 64, as they are supposed to be convertible into each other.

ktf commented 1 year ago

Both should be minimal size, but I would say having the MetaHeader part smaller is more important. Logically it belongs to local data, not shared data. For this and similar suggestions one could think to create the final MetaData header only immediately before transfer, then there would be more freedom to resize Message.

Ok, I understand the difference now. What you propose indeed makes sense to me.

Sounds good!

Indeed, if you do that, than it's enough to have fQueued shuffled after fAlignment.

Would 48 be enough for current and future use cases? Hint is 64 bits as per requirement from FLP/readout.

As far as I know, there is no x86 hardware which supports more than 48 bits of virtual address space. Maybe this should be architecture dependent? However Sylvain should probably comment here.

Which handles do you mean? ptrs and boost interproccess handles are both 64, as they are supposed to be convertible into each other.

What I mean is that right now you have:

    Manager* fManager;
    mutable UnmanagedRegion* fRegionPtr;

However you could, for example, decide that regions are known to a given Manager (i.e. the latter keeps a list of them) and change mutable UnmanagedRegion* fRegionPtr to e.g. int8_t fRegionInManagerIndex adding an indirection and giving to the manager the task to retrieve the actual region pointer. It would save you a pointer.

ktf commented 1 year ago

So how do we proceed here? Do you plan to attack this any time soon?

rbx commented 1 year ago

I will likely come to this sometime in September.

ktf commented 1 year ago

Ok, thank you!

rbx commented 1 year ago

The need to store the alignment value is for use in Message::SetUsedSize, which may need it in an edge-case.

Instead of storing it with each message, we could also accept it as a parameter to SetUsedSize.

What do you think?

ktf commented 1 year ago

Sounds reasonable to me. Indeed I could not find any invocation of SetUsedSize in the framework and only one in the old TOFCompressor, which indeed is known in advance so its not needed in any case.

Maybe you could determine minimum alignment from the current pointer (and cap it to, say, page boundaries).

ktf commented 1 year ago

basically:

size_t newAlignment = 1 << min(__builtin_ctz(oldPtr), 12);
rbx commented 1 year ago

Sounds good, then that could be done when no alignment is provided to SetUsedSize. That way no need to break the interface, alignment argument to SetUsedSize will be optional.

Then I would do this and not store alignment in shmem::Message at all.

ktf commented 1 year ago

Fully support this.

rbx commented 1 year ago

The current state in the dev branch is:

Manager& fManager;
mutable UnmanagedRegion* fRegionPtr = nullptr;
mutable char* fLocalPtr = nullptr;
size_t fSize = 0;
size_t fHint = 0;
boost::interprocess::managed_shared_memory::handle_t fHandle = -1;
mutable boost::interprocess::managed_shared_memory::handle_t fShared = -1; 
uint16_t fRegionId = 0;
mutable uint16_t fSegmentId;
bool fManaged = true;
bool fQueued = false;

This (plus the fTransport ptr from the parent and the this ptr) brings the size to 80 bytes.