WordPress / wordpress-playground

Run WordPress in the browser via WebAssembly PHP
https://w.org/playground/
GNU General Public License v2.0
1.6k stars 239 forks source link

php-wasm: easier to hit memory limit without PHP using mmap() and munmap() #1278

Open brandonpayton opened 4 months ago

brandonpayton commented 4 months ago

TL;DR: Without mmap() and munmap(), PHP cannot adjust memory regions in place and has to compensate by keeping both an old and a new region allocated at the same time. This requires more memory and triggers OOM errors more often than in-place adjustment.

Details

We stopped a memory leak in #1128 by stopping PHP's use of mmap() and munmap() to allocate memory. This was necessary because Emscripten's mmap() and munmap() implementations are incomplete and violate basic assumptions PHP makes about them.

Unfortunately, PHP relies upon mmap() to extend and munmap() to truncate memory in place, and when PHP cannot adjust memory regions in place, it has to hold onto the existing memory region while allocating a completely separate, new memory region with an updated size.

For the sake of discussion, let M be the size of the current memory region and C be the desired change in size. When adjusting a region in place, we require M + C bytes, but without the ability to adjust regions in place, we require M + (M + C) bytes to make that same adjustment.

This applies both for both truncating and extending, but extending requires the most space.

Example

We can see this using a version of the use-all-memory plugin from @sejas. Let's compare use-all-memory logs between vanilla PHP and php-wasm when running with a memory limit of 256MB.

Vanilla PHP 8.3

This version is able to adjust memory regions in place using mmap() and munmap() and reaches OOM just as it tries to extend beyond the memory limit.

* memory_get_usage(false): 251.38530731201 MB
* memory_get_usage(true): 255.00390625 MB
PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 264241184 bytes) in /Users/brandonpayton/src/playground-temp/index.php on line 16

php-wasm with PHP 8.3

This version is not able to adjust memory regions in place and reaches OOM earlier, when the combined sizes of the old and new regions exceed the memory limit.

* memory_get_usage(false): 118.42160797119 MB
* memory_get_usage(true): 138.0625 MB
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 124780568 bytes) in /wordpress/index.php on line 16

How to fix

To fix this, we need Emscripten to provide mmap() and munmap() implementations that can resize anonymous mmap() regions in-place.

The Emscripten team has said certain things are not possible with the platform, including partial munmap(), but I do not believe those statements should apply to anonymous mmap() regions which are basically just allocated memory (rather than other devices mapped into virtual memory space). I am planning to file an Emscripten issue to discuss this, and if we do not get any traction, we might at least make an Emscripten PR as an example of what can be done.

cc folks who were involved or interested in the recent memory leak fix: @adamziel @sejas @dmsnell @wojtekn

adamziel commented 4 months ago

@brandonpayton let's also fill (or bump) an issue in the Emscripten repository, link here, and cc @thomasthedane.

brandonpayton commented 4 months ago

let's also fill (or bump) an issue in the Emscripten repository,

@adamziel yep, that was on my list to do next, except for pinging the specific person. Will ping them. Thank you!

brandonpayton commented 4 months ago

Filed an issue for Emscripten: "Missing mmap()/munmap()/mremap() features for in-place adjustment of anonymous mappings" https://github.com/emscripten-core/emscripten/issues/21816

brandonpayton commented 3 months ago

I added some notes to the Emscripten issue. The most recent is https://github.com/emscripten-core/emscripten/issues/21816#issuecomment-2105205310

So the constant cost of "reallocating" memory with posix_memalign() is N + (N + G) because old must be copied to new.

And the comparative costs of using mremap/mmap are:

Better: N + G Equivalent: N + (N + G) Worse: N + ((N + G) + A)

where N is the size of the original region, G is the amount of growth, and A is the size of the alignment (2MB).

I'm not sure how important this is. It probably makes sense to wait and see how many issues there are with memory limits before investing further.

brandonpayton commented 3 months ago

There is some ongoing work that might eventually pave the way for deeper mmap support in Emscripten. See: https://github.com/emscripten-core/emscripten/issues/21816#issuecomment-2105878910

The work is being discussed in the following ticket which seems to be a very interesting read: https://github.com/emscripten-core/emscripten/issues/21620