emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.9k stars 3.32k forks source link

Missing mmap()/munmap()/mremap() features for in-place adjustment of anonymous mappings #21816

Open brandonpayton opened 7 months ago

brandonpayton commented 7 months ago

Version of emscripten/emsdk: 3.1.57

The WordPress Playground project uses Emscripten to compile the PHP runtime as Wasm, and it uses those builds to run PHP in the browser and on Node.js.

PHP's memory manager relies heavily on mmap() and munmap() for allocating and reallocating memory as anonymous mappings. There are two things PHP cannot do properly using Emscripten's mmap() and mumnmap() implementations:

  1. Allocating 2MB chunks of memory with 2MB alignment - It does this by mapping twice the amount of desired memory in order to find an aligned region within and freeing the excess at the head and tail using partial unmapping.
  2. In-place reallocation, both truncating and extending

For the PHP runtime, the primary thing that is missing from Emscripten is a munmap() implementation that supports partial unmapping of anonymous mappings. This would cover item 1 and half of item 2 (truncation). An example of aligned allocation with mmap()/munmap() can be found here with the actual munmap() call here. An example of truncation can be found here which leads to the same actual munmap() call.

Secondly, PHP attempts to extend anonymous mappings in-place using either mremap() or mmap(). When using mremap(), no flags are provided. When using mmap(), PHP passes one of the following combinations of flags:

When PHP cannot extend memory in-place, it resorts to allocating another aligned chunk of memory and copying the contents of the current chunk into it. This increases memory requirements and leads to more frequent out of memory conditions than when memory can be extended in-place.

Without in-place adjustment of anonymous mappings

Without the ability to adjust anonymous mappings in-place, PHP mishandles memory and incorrectly assumes partial unmapping works, leading to large memory leaks in certain cases.

As a workaround, Playground currently use a PHP extension to install alternate memory allocation handlers that can only allocate aligned memory and free it. No in-place adjustment is supported. Reallocation of aligned regions requires maintaining old and new memory regions at the same time while the old is copied to the new.

Could Emscripten be updated to support in-place adjustment of anonymous mappings?

cc @ThomasTheDane

sbc100 commented 7 months ago

My first comment here would be regarding the fact that "PHP's memory manager relies heavily on mmap() and munmap() for allocating and reallocating memory as anonymous mappings".

As you may be aware emscripten (and wasm in general) does no support mmap at all, and simply attempts to fake some parts of it by calling malloc/free. While we can continue to improve our fake mmap support, its normally best for serious application to avoid using this fake mmap at all and instead fall back to malloc / realloc / posix_mem_align etc, which more accurately reflect that is actually going on.

I'm certainly open to accepting improvments to our fake mmap support, but I would perhaps consider modifying zend_alloc.c to support systems without mmap at all. Right now it look like it support WIN32 and non-WIN32 but perhaps it could also have an #if !HAVE_MMAP flavor?

sbc100 commented 7 months ago

If you want to see how fake our mmap support is or if you want to try to improve it the code is here: https://github.com/emscripten-core/emscripten/blob/140a17c3b0e1cc0028e0d80a13d189511cad5633/system/lib/libc/emscripten_mmap.c#L114-L138

brandonpayton commented 6 months ago

Thank you for your thoughtful feedback, @sbc100.

TL;DR -- After writing everything out, I think you are correct that it would be better to use more direct memory management APIs. In addition, I don't know whether it really makes sense for PHP to be using mmap() at all.

In case you are interested, here are the details:

As you may be aware emscripten (and wasm in general) does no support mmap at all, and simply attempts to fake some parts of it by calling malloc/free. While we can continue to improve our fake mmap support, its normally best for serious application to avoid using this fake mmap at all and instead fall back to malloc / realloc / posix_mem_align etc, which more accurately reflect that is actually going on.

I initially intended to pursue adding deeper support for manipulating anonymous mapped regions, but after taking the time to write out the tradeoffs, I agree that it is probably use more direct allocation methods.

On the downsides of mmap

I don't know the history or thought behind PHP's memory management implementation but will attempt to talk this out based on a reading of the implementation. One of the tricky things with PHP memory allocation is that it prefers allocating 2MB chunks aligned to 2MB. This alignment is not naturally guaranteed by malloc() and realloc().

There are trade-offs when choosing whether to allocate 2MB-aligned memory using posix_memalign() or anonymous mmap().

Initial aligned allocation

posix_memalign() let's us ask explicitly for aligned allocation. If N bytes are desired, we ask for exactly N bytes and receive N bytes.

In contrast, mmap() does not allow us to request a specific alignment. If PHP requests N bytes with mmap() and does not receive a 2MB-aligned address, it immediately frees the mapped region and requests 2 * N bytes in order to carve out a region of N bytes at a 2MB-aligned address. PHP uses munmap() to free the unwanted memory at the head and the tail.

👉 For initial allocation, posix_memalign() appears to be the most direct and efficient option for allocating 2MB-aligned regions.

Growing an existing aligned allocation

AFAIK, for memory allocated using posix_memalign(), there is not a good way to grow the region in place while guaranteeing 2MB-alignment. If we use, realloc() there is a chance it will move the data to a misaligned address. To grow and guarantee alignment, we need to temporarily maintain both the original region and a new larger region while data is copied to the new region. This requires N + (N + G) bytes where N is the size of the original allocation and G is the amount of growth.

mremap() allows PHP to attempt growing 2MB-aligned memory in place. With mremap(), there is a chance reallocation will only require N + G bytes. But if mremap() fails, the situation is potentially much worse than with posix_memalign(). We need to keep the old region while allocating a new larger region, but again mmap() does not allow requesting a specific alignment. Within PHP, either mmap() happens to give PHP N + G bytes at an aligned address, in which case both old and new regions total N + (N + G) bytes while data is copied to the new region, or PHP has to request 2 * (N + G) new bytes from mmap() to carve out a 2MB-aligned region, in which case both old and new regions temporarily total N + (2 * (N + G)) bytes.

The known cost of growing 2MB-aligned memory with posix_memalign() is much better than the worst case of doing the same with mremap()/mmap():

Perhaps mremap() would often succeed in practice in a WebAssembly environment if PHP is only ever allocating 2MB-aligned regions and if the selected memory allocator looks for available memory from lowest available address. In that case, maybe the allocator would tend to find next available memory at 2MB-aligned addresses so that the memory required for reallocation is lower with mremap(). I'm not sure and don't currently have more time to investigate.

👉 Given the above comparisons, simply allocating a larger 2MB-aligned region with posix_memalign() appears to offer a reasonable compromise between the best and worst cases of using mremap() and mmap().

On official support for posix_memalign() within PHP

I'm certainly open to accepting improvments to our fake mmap support, but I would perhaps consider modifying zend_alloc.c to support systems without mmap at all. Right now it look like it support WIN32 and non-WIN32 but perhaps it could also have an #if !HAVE_MMAP flavor?

This seems like a good idea. It likely couldn't be done for older PHP versions, but it might be worth exploring for future PHP versions.

brandonpayton commented 6 months ago

Before closing this, let's double check in case I am missing something:

Are there any memory management APIs other than mremap() that could be used to attempt growing memory in place without the risk of data being moved to an address that is not aligned to a 2MB boundary?

I do not currently believe so and understand that Emscripten does not support mremap() at the moment. https://github.com/emscripten-core/emscripten/blob/7e7c057357c7c8704a8bd7fb82c86b04c320c53a/system/lib/libc/emscripten_syscall_stubs.c#L213-L216

sbc100 commented 6 months ago

Before closing this, let's double check in case I am missing something:

Are there any memory management APIs other than mremap() that could be used to attempt growing memory in place without the risk of data being moved to an address that is not aligned to a 2MB boundary?

No I don't think so, because we don't have virtual memory support its not really possible to grow without moving (unless you get luck and there happens to be some free space after your allocation.. i.e. the happy path of realloc).

sbc100 commented 6 months ago

For what its worth I believe @kg is currently looking at implementing a lower level page allocator that sits underneath malloc: #21620. If this works out it should give you must better mmap support by mremap will likely not be part of that either.

brandonpayton commented 6 months ago

Sorry, @sbc100. It turns out I was wrong about the worst mmap case in the above comment.

This is incorrect:

mmap() may cost N + (2 * (N + G))

The potential worst case of growing a 2MB-aligned memory region with mmap within PHP is actually N + ((N + G) + A) where N is the size of the original region, G is the amount of growth, and A is the size of the alignment (2MB).

So the constant cost of "reallocating" memory with posix_memalign() is N + (N + G) because old must be copied to new.

And the comparative costs of using mremap/mmap are:


It's not such a dramatic difference as I made it seem above, but I'm not sure it is worth the work to sometimes obtain the lower N + G memory requirement. Will think a bit more about this.

brandonpayton commented 6 months ago

No I don't think so, because we don't have virtual memory support its not really possible to grow without moving (unless you get luck and there happens to be some free space after your allocation.. i.e. the happy path of realloc).

Thanks for confirming.

For what its worth I believe @kg is currently looking at implementing a lower level page allocator that sits underneath malloc: https://github.com/emscripten-core/emscripten/issues/21620. If this works out it should give you must better mmap support by mremap will likely not be part of that either.

Thanks for the pointer. That sounds like really interesting work.

Initially, when considering making a PR, it seemed like improving mmap()/mremap()/munmap() support would require adding explicit support to one of the allocators and conditionally using that support depending on the selected allocator. Would that kind of allocator-specific change be considered or does the solution need to be more universal? (If so, it seems like starting with emmalloc would be simpler than dlmalloc)

kg commented 6 months ago

We've just landed our prototype mmap/munmap replacement to start testing it out in our runtime. If it shakes out well I may look into what it would take to expand its support to the point that we could consider upstreaming it into emscripten. It would be good to understand what features are actually necessary for it to be sufficient as a foundation for emscripten dlmalloc/mimalloc.

Stuff that's missing right now:

sbc100 commented 6 months ago

We've just landed our prototype mmap/munmap replacement to start testing it out in our runtime. If it shakes out well I may look into what it would take to expand its support to the point that we could consider upstreaming it into emscripten. It would be good to understand what features are actually necessary for it to be sufficient as a foundation for emscripten dlmalloc/mimalloc.

Stuff that's missing right now:

  • mapping files (we do this through a separate path, so we just use emscripten's implementation directly)
  • mremap
  • multithreading
  • 64-bit support
  • map_fixed/addr hints
  • custom alignment

Great!

I think multithreading and 64-bit are both important. Non of the other things matter to malloc implementations I think.

Specifically map_fixed seems basically impossible without an actual virtual address space, and custom alignment I don't think matters since mmap doesn't support that anyway. It should always do be page aligned.