SuperHouse / esp-open-rtos

Open source FreeRTOS-based ESP8266 software framework
BSD 3-Clause "New" or "Revised" License
1.52k stars 491 forks source link

memory defragmentation problems - added a best-fist algortm to nano-malloc #599

Open gpreviato opened 6 years ago

gpreviato commented 6 years ago

Hi,

I had a problem related to memory fragmentation, that stop the system working after some runs.

To better manage it I have added a 'best-fit' algorithm to nano_malloc (I have forked newlib from the great ourquality!).

I have good results actually, but is still an 'alpha'... To work properly requires also some slight changes in newlib_syscall.c

Thank you for your work and let me know if you need some inside to changes,

ourairquality commented 6 years ago

It does not appear to support the separate malloc regions needed for dram versus iram allocation. With that change the sbrk() function became a nop, there is no memory to grow, rather it is all pre-allocated to the malloc free list.

There is another malloc implementation in newlib that does attempt to better handle fragmentation, but it seems at some cost, and it has not been extended to support the separate regions. From a quick look it appears to try to group objects of similar size in pools.

Your strategy appears to be to search for the region on the free list that has the least room but that still fits the new allocation? That seems quite different to trying to allocate objects of a similar size to the same pool.

A fall back strategy might be needed, to restart the code if fragmentation becomes a problem. This might be tested on a malloc failure, so if the free space is significant on a malloc failure then flag the heap as hopelessly fragmented and schedule a restart.

It might be possible to extend the malloc region support to allow sub-pools to be defined, and some interface to allocate them. Then code could demand or prefer to allocate to a particular pool.

Another alternative might be to allow sub pools to be defined based on object size, and an interface to allow these to be allocated.

gpreviato commented 6 years ago

As far I have understood (but please correct me if I'm wrong), there are two regions: the first is the classic 'Heap' defined by the linker, the second one is the chunk added to the free pool after the system boot. There is no way for 'nano-malloc' to understand what kind of request is made, because there is only the 'dimension' param, as a request, and not any preference to the region. Iram is statically allocated by the linker, so there is not dynamic management for. So, why distinguish the two? The idea is simple: Manage the second region as free chunk added to the main pool. Why not?

The best fit algorithm, as described in papers describing policies for malloc, try to allocate the request to the best fit free chunk in size. This help reducing fragmentation in the case you have requests of small chunks, that repeat in the time. By this way the free chunks tends to be re-allocated leaving bigger memory area available for other requests.

In effect I've seen a great stabilization in the heap (I have some tasks running together), and despite some bugs that I'm solving now in this algorithm (As I've said is alpha actually... but I like to have comments about..) the advantage (at least in my case, where I have the small heap request coming from buffers alternated to the big buffers from the http server) is greatly evident.

Restart the code could be not the right solution, especially if the software is serving an http page or providing an important service. I think that another solution could be to have different kind of allocation strategy, so that the best one could be choose depending from the type of code someone is running .... and also add some tools to measure the 'heap status' (I mean fragmentation..), that what I'm trying to do now..

Many thanks for your great work.

ourairquality commented 6 years ago

The current code implements sbrk. The wip changes to support multiple malloc regions makes sbrk a nop and there is no sbrk reservation to allocate, and all the memory is added to the malloc free list. Some of the memory is allocated when available, for example the initial stack area - it is just pushed onto the malloc free list for use.

The wip stores the regions to use in reent_ptr->malloc_region_mask. There is a callback into the esp-open-rtos code to check if a free list address is consistent with the region mask, see _malloc_region_masked(). This could all be improved, better abstracted, etc. Not claiming it is pretty. But an attempt has been made to set the higher level interface to this, which is by way of a thread local preference.

There is a requirement to be able to allocate to either the dram or iram, and it is desirable to be able to prefer one or the other. This requirement is due to hardware requirements, and perhaps software design decisions. The iram can not be used for some hardware buffers, and using it for task stack memory might challenge the exception handlers etc, and the iram supports only 32 bit accesses, the 8 and 16 bit accesses are emulated in the exception handler, so there are also performance reasons to allocate to dram versus iram.

Generally heuristic allocation strategies are not going to prevent failure due to fragmentation, so detection of failure due to fragmentation is still needed in general. Memory compaction is not possible in C, not without effectively implementing a higher level memory manager on top of C, and no higher level alternative seems to have emerged. Thus simply restarting might be a practical strategy for these devices. Whether fragmentation is a problem depends on allocation patterns and the amount of free memory.

The 'best fit' strategy would appear to lead to fragmentation but placing small objects next to larger objects and if these small objects are long lived then they fragment the heap. It would probably be better to place small objects close to other small objects, to use allocation bins. It also to distributes temporally related allocations over the heap. These are just heuristics and there are endless possibilities and problems.

Off topic, but this is an area that interests me. There are for example some JS VMs for small systems that can use 16 bit pointers for better memory efficiency, and these might also be able to compact their heaps. Probably many potential VMs that can compact the heap. These might better support general strategies for memory management, but at the cost of memory compaction pauses.

Overall it seems easiest for esp-open-rtos apps to pre-allocate large buffers to avoid fragmentation so the heap is largely a 'bin' of small objects, and to add fragmentation failure detection with support for graceful restarts.

gpreviato commented 6 years ago

Thanks for the description.

Actually seems the code is just allocating chunks of ram, in dependence from the size requested. I have to say that I don't have found the code for the _malloc_region_masked() and I guessed is only doing a consistency check.

Despite what could seem, a lot of papers describe the best-fit algorithm as one of the bests in fragmentation reductions, at least vs the first fit, that is one of the worst (by the same measure), also if often implemented.

Obviously we have to take in consideration that the allocation process in an embedded system is far away from a 'random' process...

Anyway, I'm not saying that the actual code is awful or anything else, but I would like to point aout that in a system with suche a kind of constrains, in terms of RAM, the 'right' algorithm and tool could sometime save the day... Anyway we are lucky: with an open system anyone can implement what thinks is better!

I'm fully aware that the RAM compaction is not possible, and this is the first reason to avoid fo fragment the heap in the allocation process. By the way I don't agree with the idea that the system 'restart' out of the control of the application, except in extreme situations.

--On March 31, 2018 at 11:40:56 PM +0000 Our Air Quality notifications@github.com wrote:

The current code implements sbrk. The wip changes to support multiple malloc regions makes sbrk a nop and there is no sbrk reservation to allocate, and all the memory is added to the malloc free list. Some of the memory is allocated when available, for example the initial stack area - it is just pushed onto the malloc free list for use.

The wip stores the regions to use in reent_ptr->malloc_region_mask. There is a callback into the esp-open-rtos code to check if a free list address is consistent with the region mask, see _malloc_region_masked(). This could all be improved, better abstracted, etc. Not claiming it is pretty. But an attempt has been made to set the higher level interface to this, which is by way of a thread local preference.

There is a requirement to be able to allocate to either the dram or iram, and it is desirable to be able to prefer one or the other. This requirement is due to hardware requirements, and perhaps software design decisions. The iram can not be used for some hardware buffers, and using it for task stack memory might challenge the exception handlers etc, and the iram supports only 32 bit accesses, the 8 and 16 bit accesses are emulated in the exception handler, so there are also performance reasons to allocate to dram versus iram.

Generally heuristic allocation strategies are not going to prevent failure due to fragmentation, so detection of failure due to fragmentation is still needed in general. Memory compaction is not possible in C, not without effectively implementing a higher level memory manager on top of C, and no higher level alternative seems to have emerged. Thus simply restarting might be a practical strategy for these devices. Whether fragmentation is a problem depends on allocation patterns and the amount of free memory.

The 'best fit' strategy would appear to lead to fragmentation but placing small objects next to larger objects and if these small objects are long lived then they fragment the heap. It would probably be better to place small objects close to other small objects, to use allocation bins. It also to distributes temporally related allocations over the heap. These are just heuristics and there are endless possibilities and problems.

Off topic, but this is an area that interests me. There are for example some JS VMs for small systems that can use 16 bit pointers for better memory efficiency, and these might also be able to compact their heaps. Probably many potential VMs that can compact the heap. These might better support general strategies for memory management, but at the cost of memory compaction pauses.

Overall it seems easiest for esp-open-rtos apps to pre-allocate large buffers to avoid fragmentation so the heap is largely a 'bin' of small objects, and to add fragmentation failure detection with support for graceful restarts.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread. [Image: "/"]

ourairquality commented 6 years ago

If you want to add 'best fit' then I would be happy to add that as a newlib option in my own fork, but as noted it would need to support allocation to the different regions, and it should not require any changes on the esp-open-rtos code side. The _malloc_region_masked() function is defined in the esp-open-rtos code see core/newlib_syscalls.c. The goal was to try to abstract it so that it might have some chance of landing upstream, so to get the esp-open-rtos specific code out of newlib, but that has not been achieved yet.

One other issue that needs to be explored is a memory-pressure event. There are various resources in lwip that can be freed under pressure. There are hooks for pool-pressure events, but we do not use pools so these do not all work. If a fragmentation failure is hit then perhaps a memory-pressure event could free up some space. Application code might be able to respond to memory pressure events too. This might give a 'restart' like benefit, allowing some subsystems to free resources and restart to clear fragmentation.

It might not be necessary to restart without the control of the system. It was suggested to 'flag' a restart as necessary, and then leave it to the application to restart in an orderly manner, rather than have malloc trigger a restart like a wdt event. For example: in my own code there is some core code doing data gathering and storage using mostly pre-allocated resources, and there is robust networking code that expects failures so if the networking paths fail due to fragmentation then the core could schedule a restart in an orderly manner.

If failure due to fragmentation is not acceptable then an application needs to manage memory well.

gpreviato commented 6 years ago

If could help anyone, I'll be happy to share my code.

By the way, no way I can found the _malloc_region_masked... not by deep look into the file you said, nor searching by grep..

Probably I'm missing something...

Anyway, I'll try to be compliant to the constrains you wrote. I just need to better undertand what means 'support allocation to the different regions'. I think this means 'by application choice', but I don't have understood how the choice is made now...

ourairquality commented 6 years ago

Might you be using the esp-open-rtos master branch rather than the malloc-regions branch which has has the wip support for multiple regions, see https://github.com/SuperHouse/esp-open-rtos/pull/584 The newlib source code, and the lwip source code, is now out of sync with the esp-open-rtos master branch, but it should not have even compiled without your changes if these were not in sync. This might have been frustrating your efforts, sorry. Another possible frustration is that I rebase PRs, to keep the final PR clean, and it might take a few extra steps to checkout the latest version.