DevShiftTeam / AppShift-MemoryPool

A very fast cross-platform memory pool mechanism for C++ built using a data-oriented approach (3 to 24 times faster than regular new or delete, depending on operating system & compiler)
Apache License 2.0
214 stars 25 forks source link

MemoryPool needs copy and move semantics #20

Open fvalasiad opened 2 years ago

fvalasiad commented 2 years ago

Basically the title. They are essential to be able to use the class effectively. MemoryPoolAllocator also faces problems cause of this.

LessComplexity commented 2 years ago

Will add move semantics and copy constructor tomorrow, have a lot of work lately so I've been off a little. Your help is very appreciated! :)

fvalasiad commented 2 years ago

@LessComplexity Make sure to update the README.md when you are done! Users won't have to manually handle a pointer anymore, they can just create it in the stack and move it around with move semantics or even copy it if they wish.

I'm troubled though as to what "copying" means.

Does it mean creating a same-sized memory pool? Or copy the object as a whole and as a result copy the information inside about which sub-blocks are allocated and which are free? That is up to you to decide but the allocator is restricted by the standard as to what copying it should do.

LessComplexity commented 2 years ago

@fvalasiad Will add readme of course :) About a copy constructor it is indeed a boggling issue, as copying a chain of blocks used for memory management is a heavy operation, meaning that the best thing to do is to keep the blocks as they are and only copy the references/pointers to them, which is like a move constructor that doesn't free the pointers from the passing object.

fvalasiad commented 2 years ago

@LessComplexity

I spent some time looking at the code base and i have the following observations:

  1. Deep copying the object makes no sense. The state only makes sense if the user got the pointers from the reserved blocks. I suggest we delete the copy operations and keep only the move semantics.

Now the allocator HAS to have a copy constructor but that can be done manually inside the class without any need to modify the MemoryPool instance within it.

  1. I looked at your codebase and i noticed that your implementation is bad in terms of fragmentation and memory usage. Are you familiar with operating system's memory management algorithms such as FirstFit, NextFit, BestFit, WorstFit, techniques such as Paging and Segmentation et cetera?

We really need to the change how the memory is allocated with efficiency in mind(we shall use benchmarks). Also worth noting is that unlike the OS, we can't compress memory(as that would invalidate pointers, unless we create our own class that keeps a reference to a pointer inside MemoryPool and changes as needed). This seems quite complex though so i wouldn't try it just yet, maybe in the future try and see if it's better.

I suggest we start with a solid firstFit and work our way up from there by trying different things.

  1. I also saw your garbage reusage technique and it's also lacking.

What if two or more deleted blocks, both in garbage, are actually next to each other? They should be merged into one, your implementation doesn't ensure that.

  1. We should try and look at what our options are for lock-free implementations, if they are bad we can proceed with mutexes. But don't add mutexes to the existing implementation, they add overhead which isn't needed in a single threaded environment. I suggest we create a separate class that deals with the problem of thread safety.

  2. Linked lists are terribly bad most of the time and you use them extensively. They are literally cache-misses generators, i really suggest we change them in your existing implementation with simple vectors in a data-oriented design and run benchmarks to prove a point. It would be great if you could give me write permissions on a separate feature branch to work on this.

I am really looking forward to see as to just how faster and efficient this implementation can be. Hope you share my excitement and help me work towards performance.

  1. Oh i also noticed that you are effectively restricting the project to c++17 just to use an if statement with an initializer. Most of the project is low level C code so it would be kinda great if we added support for C++98. We can use

    if __cplusplus to include higher level facilities such as move semantics etc.

Looking forward to hear your thoughts on this.

LessComplexity commented 2 years ago

@fvalasiad Thank you for the time and the feedback! Sorry for the late reply, I had some hard work days. I'll try to address all your feedback as much as I can, you certainly have valid observations.

  1. Deep copying the object makes no sense. The state only makes sense if the user got the pointers from the reserved blocks. I suggest we delete the copy operations and keep only the move semantics.

Indeed, thats why I mentioned in my last comment that I shall only copy the pointers/references, though it might even be more correct to remove them as you suggest.

  1. I looked at your codebase and i noticed that your implementation is bad in terms of fragmentation and memory usage. Are you familiar with operating system's memory management algorithms such as FirstFit, NextFit, BestFit, WorstFit, techniques such as Paging and Segmentation et cetera?

In fact, I haven't tried to implement any algorithm to handle fragmentation since I was working to implement a solution that will be as performant as possible. In a strict sense, ridding of fragmentation involves sacrificing performance.

In some cases it might fit, for example if I keep track of the order of allocations and deallocations, or when I know I'm gonna make big releases of memory at once (i.e. the memory pool scope I implemented). But for most cases, yes, fragmentation will be an issue.

Therefore, I want to make a pool where its level of performance and fragmentation can be chosen by the user, with this we can create a system on top that can track the allocations of a program to determine the best choice of pools and fragmentation levels, and even be able to implement it dynamically. This of course includes segregated pools and stack based pools.

  1. I also saw your garbage reusage technique and it's also lacking.

What if two or more deleted blocks, both in garbage, are actually next to each other? They should be merged into one, your implementation doesn't ensure that.

Indeed, I've actually developed a patch to concatenate touching blocks in another project in which I used my pool.

  1. We should try and look at what our options are for lock-free implementations, if they are bad we can proceed with mutexes. But don't add mutexes to the existing implementation, they add overhead which isn't needed in a single threaded environment. I suggest we create a separate class that deals with the problem of thread safety.

In general, I assume that a good lock-free implementation is to use only thread local memory pools, as it is the easiest to implement, though it might cause an overhead if it is a heavily threaded application. Another option is to implement something similar to libc's malloc arenas. Even libc is using mutexes along side this method, so other than thread local pools I cannot think of another lock free way (though I will still try to think of one).

To prevent overhead on single threaded environment, we can create a pool with a configuration that defines the thread safety algorithms, just like what is planned for fragmentation. Creating a separate class to deal with thread safety is an option that needs to be carefully considered, as we might just integrate its implementation details into the memory pool class itself and activate it according to configuration, to really understand this I think the only way is to try to think of a more detailed description or even implementation of those approaches, and check for the most configurable and efficient solution.

  1. Linked lists are terribly bad most of the time and you use them extensively. They are literally cache-misses generators, i really suggest we change them in your existing implementation with simple vectors in a data-oriented design and run benchmarks to prove a point. It would be great if you could give me write permissions on a separate feature branch to work on this.

I am really looking forward to see as to just how faster and efficient this implementation can be. Hope you share my excitement and help me work towards performance.

Yes my friend, I share your hate for linked list as deeply as it can get 😄 I've tries to think of a couple of approaches to handle memory without linked lists, and most of the time the solutions result in extra memory usage, I would love to see your try and idea to approach this issue, I love the excitement. Remember that even libc's malloc is using linked lists.

I will grant you write permissions from there you can create your own branch. Have fun! 😉

  1. Oh i also noticed that you are effectively restricting the project to c++17 just to use an if statement with an initializer. Most of the project is low level C code so it would be kinda great if we added support for C++98. We can use

    if __cplusplus to include higher level facilities such as move semantics etc.

Most of the implementation is about memory management which is a low level concept, therefore the code itself won't have any noticeable high level features, it actually can be modified pretty easily to support older versions of C++, I will consider it :)

Currently I'm trying to read and plan the development of this project, I want to create a descriptive document and architecture from which it will be easy to follow down into implementation, such that also we will be able to comment and suggest on the overall design choices.

Have an awesome week, Best regards, Sapir. :)

fvalasiad commented 2 years ago

@LessComplexity Just to clarify, should i be cautious of increasing the object's size? Is this the reason you choose linked lists?

Do we value object size more than efficiency? Seems kind of restricting. Once again maybe two distinct classes would do and allow us to create heavy single-object pools or multiple small ones with no expense.

LessComplexity commented 2 years ago

@fvalasiad

@LessComplexity Just to clarify, should i be cautious of increasing the object's size? Is this the reason you choose linked lists?

Do we value object size more than efficiency? Seems kind of restricting.

The simple answer to this is no. As I mentioned towards fragmentation, we can also have different modes of operation for the memory pool in terms of tradeoff in efficiency and space, for example in a system with low space such as some embedded systems we might prefer using linked lists. An important thing to notice is that the current linked lists represent physical areas in the memory pools and their relations/order. Your suggestion made me go back to think of more data-oriented approaches, and I have quite a few, though as you also wrote it needs to be tested to prove its efficiency (there might be some pitfalls since the areas in the memory pool itself might need to be accesses anyways, which will result in a cache miss this way or another).

Once again maybe two distinct classes would do and allow us to create heavy single-object pools or multiple small ones with no expense.

I might have not fully got what you mean by that. My current vision is to create two types of memory pools, stack-based and segregated. Each of which can be chosen to work in different modes of operation (in terms of space usage, efficiency, fragmentation and thread safety).

After that, I want to create a higher abstraction system (i.e. class) that can be attached to the new/delete operators, and through analyzing the allocations and deallocation overtime of a given application, it will determine automatically, or through manual configuration, which memory pools to use and at which modes of operation. This way we can create a configurable and even automatic memory pool architecture.

I would like to get a more elaborated explanation of what you mean 😄

fvalasiad commented 2 years ago

@LessComplexity

Oh when i was talking about space i was talking about object static size.

Like when i was writing devector where you can check in my profile i was careful not to make the object too fat compared to std::vector.

std::vector basically uses 2 pointers( memory block + end ) and a size_t( capacity ) meaning around 24byte size in a 64bit operating system.

Why do we care about such stuff? Well if a user for some reason wants to create multiple different memory pools he would kinda wish for the object size to be small in order to optimize cache. We DO care about the size because using the allocator the MemoryPool will also be a part of said containers which will now have a stateful allocator thus increasing their size by the object's size + alignment. This makes me believe that we are in desperate need of a global allocator which has no state but rather uses a global memory pool. Honestly speaking i think the latter is the most common scenario in which someone would use a memory pool as an allocator.

Not to create multiple ones for different containers, but rather a shared one which treats a shared memory(that of the system's) differently than the common OS approach.

I am now thinking of ways to replace linked lists with vector to improve efficiency without putting too much of an extra load to the object's size. But if you agree with me that said object is supposed to be used one at a time, i can be more tolerant.

I am definitely creating a version that doesn't care about size and aims for optimal efficiency though, if the efficiency gain(if any) ends up being worth it we can consider our options.

fvalasiad commented 2 years ago

@LessComplexity

Been thinking about what you said, that we don't care as much about fragmentation and that we can potentially sacrifice memory blocks for optimal efficiency and it made sense. Yet later i thought that the more fragmentation, the more blocks of memory are wasted, which effectively means more and more allocations are needed, which means system calls which means performance penalty.

So, now more than ever i consider benchmarking our options to be the way to go.

LessComplexity commented 5 months ago

@LessComplexity

Been thinking about what you said, that we don't care as much about fragmentation and that we can potentially sacrifice memory blocks for optimal efficiency and it made sense. Yet later i thought that the more fragmentation, the more blocks of memory are wasted, which effectively means more and more allocations are needed, which means system calls which means performance penalty.

So, now more than ever i consider benchmarking our options to be the way to go.

I know it's been a long time, but I've started a new branch to implement the pool with different architectures and let the user choose his own architecture based on needs - considering if he wants less fragmentation or less latency or object specific pool, all within the same pool library :)

fvalasiad commented 5 months ago

It's been such a long time indeed, I was a different person back then, was fun re-reading my previous comments.

I've moved to become a full time C developer ever since. I mean I still interact with C++ but not like before, using a minimal version of it in corporate code instead :desktop_computer:.

Anyways I'll try to look into it if I find spare time, maybe I can contribute a thing or two!