Closed wanderingbort closed 12 months ago
@spoonincode has correctly pointed out in chats that the PoC's use of static allocator members is not compatible with our unit tests where more than one controller and therefore chainbase can exist in the same running process.
He has suggested a well-known memory location scheme such that given this
for an object you can derive the memory location of the allocator.
Another issue to address in the PoC is that not a lot of care is taken in shared_cow_vector<T>
to make sure contained objects are properly constructed.
Nor are the allocations typed meaning they may run afoul of alignment and padding rules. It is trivial to remap the allocators toa type so that they properly respect alignment etc
note: removing allocators from shared_cow_string exposed an implementation bug where sizeof(shared_cow_string) was used in place of sizeof(shared_cow_string::impl) in the deallocation. Somehow this was not creating issues?
I just saw that the deallocate
function of bip::allocator<...>
does not use the passed size:
void deallocate(const pointer &ptr, size_type)
{ mp_mngr->deallocate((void*)ipcdetail::to_raw_pointer(ptr)); }
He has suggested a well-known memory location scheme such that given
this
for an object you can derive the memory location of the allocator.
Hey @spoonincode, what scheme did you have in mind exactly? I imagine we can track the segment_manager
address and size for each chainbase instance, maybe in a static boost::flat_set
, and from that figure out the chainbase instance the string/vector was allocated in. For a single instance (which is the case in Leap) this should be efficient.
Is this what you had in mind?
Since the max size is 8TB, always map the chainbase at a 8TB boundary (there are a handful of ways of doing this: you can just probe around until one works, or map 8TB+db_config_size PROT_NONE and then MAP_FIXED in to that. The later has the benefit of leaning on existing libc & kernel logic to find a spot & ASLR and such (not that there is much to randomize..)
We always construct the BIP::segment_manager
at a known offset in the mapping,
https://github.com/AntelopeIO/chainbase/blob/main/src/pinnable_mapped_file.cpp#L106
and the segment_manager
has a get_allocator()
,
https://www.boost.org/doc/libs/1_83_0/doc/html/boost/interprocess/segment_manager.html#idm28621-bb
So that means given some pointer inside of the mapping, the allocator can be found via something like
((segment_manager*)((ptr & ~(0x80000000000-1)) + header_size))->get_allocator()
@spoonincode that's great, I should have though of that!
@spoonincode unfortunately I cannot use PROT_NONE
or MAP_FIXED
with the boost interprocess mmap API, which is what allows us to support non-linux platforms.
I feel that the best way forward is to use a static flat_set
(my first idea). It is slightly slower than the pointer masking trick, but it still should be very fast (esp when we have a single chainbase instance), and since it is used only when performing allocations or deallocations, I think the overall impact will be negligible.
There are a few improvements to chainbase that can reduce wasted space in the shared memory file that exist in the
chainbase-container-allocator-poc
branchPer-instance allocator storage
each
shared_vector
andshared_cow_string
stores its own copy of the chainbase allocator. This is always the same pointer even when it has been cast into a different sized allocator (egnode
vschar
allocator). These types need access to the allocator for their mutating actions as well as destruction. As it is always the same so the POC implements storage as a global static member of the container class with assertions that the assumption its the same holds. Some method of removing all the duplicated allocators suffices. In order to achieve this forshared_vector
a newshared_cow_vector
was created. this has the additional benefit of creating fewer spurious allocations in the shared memory file on mutations which may lead to a less fragmented file.**note: removing allocators from
shared_cow_string
exposed an implementation bug wheresizeof(shared_cow_string)
was used in place ofsizeof(shared_cow_string::impl)
in the deallocation. Somehow this was not creating issues?empty
shared_cow_string
allocationsWhen a
shared_cow_string
is initialized at runtime (eg, when a new account is created and its ABI is not set) it does not allocate ashared_cow_string::impl
structure. Thenullptr
held in theshared_cow_string
is equivalent to an empty string. However, when ashared_cow_string
is read from a Datastream or variant, the behavior is different. In that caseresize_and_fill
or an equivalent is used and this always allocates even if the size passed in is 0. This results in extra bloat when loading from a snapshot that is not present when the state is derived from replay. The POC prevents allocation for empty structures.Results
When testing the POC at block 310859547 from snapshot these changes reduced the initial state size from 23,263,420,992 bytes to 21,785,865,104 bytes (or around 6% savings)