Closed GaryOderNichts closed 2 years ago
This has been tested in several applications and it seems to work fine so far. Going to mark this as ready and waiting for a code review now.
Just pushed a commit which uses a spin lock instead of a mutex, which improves speeds even more.
OSUninterruptibleSpinLock
):Allocations | Free | |
---|---|---|
malloc | 4066 µs | 1994 µs |
memalign | 8536 µs | 2296 µs |
Total time: 16892 µs
This is now almost a 9x total time improvement over the CafeOS heap.
Summary
Wut currently uses wutmalloc as a wrapper around the default CafeOS heap functions (
MEMAllocFromDefaultHeap
/MEMFreeToDefaultHeap
). This default heap is really slow for large amounts of allocations, which causes lots of slowdowns. A lot of retail games use a fast custom heap to prevent this issue. This draft uses the malloc implementation in newlib instead and replaces the default heap functions with a wrapper around the newlib functions (see wutdefaultheap). This is currently marked as a draft since it's a somewhat major change and there might be potential issues resulting from this which I haven't thought of.RPX files
RPX files now implement and export a
__preinit_user
function, which will be called before any allocations are done to allow replacing theMEMAllocFromDefaultHeap
/MEMFreeToDefaultHeap
functions (see memdefaultheap.h).In the preinit call wut allocates all of the available space in the MEM2 heap for sbrk. It then initializes wutdefaultheap which will replace
MEMAllocFromDefaultHeap
/MEMAllocFromDefaultHeapEx
/MEMFreeToDefaultHeap
with wrappers around the newlib functions. This results in CafeOS functions allocating from the newlib heap instead.Overriding this behavior
The user can override this behavior by implementing their own
__preinit_user
function. This will skip the sbrk and wutdefaultheap initialization, and__init_wut_malloc
can be called which results in linking in the old wrapper around the default heap. See this code for an example.RPL files
Since RPL files don't support
__preinit_user
(and shouldn't mess with the default heap), they will simply use wutmalloc which results in allocations from the heap, which RPX has set up.Speed comparisons
For testing the speed I wrote a simple tool, which does a lot of heap allocations of various sizes, frees them, and displays the times they took. This tool is probably not the best for accurate timing, but should be enough to show the performance increase.
Using the default CafeOS heap:
Total time: 150461 µs
Using the custom newlib heap:
Total time: 28370 µs
This is roughly a 5x total time improvement.