WordPress / wordpress-playground

Run WordPress in the browser via WebAssembly PHP
https://w.org/playground/
GNU General Public License v2.0
1.63k stars 249 forks source link

PHP memory leak #1128

Closed adamziel closed 6 months ago

adamziel commented 6 months ago

The issue below was reported by @sejas. It negatively impacts user experience of Playground and wp-now. Let's make sure:

A few memory leaks were already patched in this repo, find old PRs for more context.

pm.max_requests int The number of requests each child process should execute before respawning. This can be useful to work around memory leaks in 3rd party libraries. For endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS. Default value: 0.


What @sejas reported:

I've created a PHP function to make it much easier to benchmark the memory usage: I'm currently adding it to the index.php, but you can also try the plugin in Studio or wp-now.

function useAllMemory() {
    echo "Initial memory usage: " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
    // ini_set('memory_limit', '1024MB'); // The php limit seems to be 128MB but it doesn't affect the results.
    $data = '';

    while (true) {
        $data .= str_repeat('a', 1024 * 1024); // Increase string size by 1MB in each iteration
        echo "* " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
    }
}
useAllMemory();
die();

Here are my results:

Observe that the site screenshot displays almost 60MB of maximum memory. The next (2nd) page load displays 29MB The third page load 4.4MB and then 2.4 MB and 1.3 MB

Screenshot:

315332005-ef2eabc9-ac83-4762-b460-18f3cbd763cd

cc @brandonpayton

adamziel commented 6 months ago

Relevant unit test: https://github.com/WordPress/wordpress-playground/blob/06f27eb56ac72ba70bdca74f2ee30fb04597e93c/packages/php-wasm/node/src/test/php.spec.ts#L1158

A historical PR that added that test:

https://github.com/WordPress/wordpress-playground/pull/993

adamziel commented 6 months ago

I checked a few quick ideas to give the next person some entrypoints and potentially save a few hours of looking for these:

I just quickly tested whether increasing the memory_limit helps at all and it doesn't seem to:

http://localhost:5400/website-server/?url=/crash.php

diff --git a/packages/playground/remote/src/lib/worker-thread.ts b/packages/playground/remote/src/lib/worker-thread.ts
index cfc77fea2..06b78152d 100644
--- a/packages/playground/remote/src/lib/worker-thread.ts
+++ b/packages/playground/remote/src/lib/worker-thread.ts
@@ -241,6 +241,7 @@ const [setApiReady, setAPIError] = exposeAPI(

 try {
    php.initializeRuntime(await recreateRuntime());
+   php.setPhpIniEntry('memory_limit', '256M');

    if (startupOptions.sapiName) {
        await php.setSapiName(startupOptions.sapiName);
@@ -286,6 +287,20 @@ try {
        'playground-includes/wp_http_dummy.php': transportDummy,
        'playground-includes/wp_http_fetch.php': transportFetch,
    });
+   php.writeFile('/wordpress/crash.php', `<?php
+   function useAllMemory() {
+       echo "Initial memory usage: " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
+       // ini_set('memory_limit', '1024MB'); // The php limit seems to be 128MB but it doesn't affect the results.
+       $data = '';
+   
+       while (true) {
+           $data .= str_repeat('a', 1024 * 1024); // Increase string size by 1MB in each iteration
+           echo "* " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
+       }
+   }
+   useAllMemory();
+   die();`)
+

    if (virtualOpfsDir) {
        await bindOpfs({

Which is weird considering how:

https://github.com/WordPress/wordpress-playground/blob/06f27eb56ac72ba70bdca74f2ee30fb04597e93c/packages/php-wasm/node/public/php_8_0.js#L5121

and

https://github.com/WordPress/wordpress-playground/blob/06f27eb56ac72ba70bdca74f2ee30fb04597e93c/packages/php-wasm/compile/php/Dockerfile#L839-L840

I was, however, able to use a bit more memory before crashing by not extracting WordPress files into MEMFS – I think HEAP is used to store their content:

adamziel commented 6 months ago

I tried two more things:

php[__private__dont__use].malloc(1024 * 1024 * 512);

and

Neither seemed to have a significant effect on the size of the string PHP is able to allocate:

* 49.415100097656 MB
* 50.415100097656 MB
* 51.415100097656 MB
* 52.415100097656 MB
(crash)

HEAP size is not an issue here, then. It must be that PHP thinks it runs out of memory when it actually doesn't.

brandonpayton commented 6 months ago

@adamziel Thanks for doing all this work to help narrow the possibilities.

brandonpayton commented 6 months ago

I am looking at whether we can just build PHP with debug info and step through it in the browser. This page suggests it is possible at least sometimes: https://developer.chrome.com/docs/devtools/wasm

So far, I've asked emscripten to build with debug info via the -g flag, and php_8_0.wasm is now 21MB instead of ~5MB. So there is something more there. The OOM is still happening with this build which is good.

Now I'm looking at whether we can step through it in Chrome. The original source files may be required, and that might be a challenge since the build is done within Docker. We'll see.

brandonpayton commented 6 months ago

The original source files may be required, and that might be a challenge since the build is done within Docker. We'll see.

With Chrome Canary, after tweaking the Dockerfile to add the -g flag and disable optimization with -O0, I've gotten the dev tools listing PHP's source files, but it is indeed missing actual file content since it is referring to the php-src path within the Docker build.

Screenshot 2024-03-23 at 12 01 21 PM

With some tweaks we should be able to fix this. Then we can hopefully step through the debugger to observe where and why PHP thinks it is out of memory.

adamziel commented 6 months ago

@brandonpayton check -g3 and the SOURCE_MAPS option in the Dockerfile. Entry to the rabbit hole: https://github.com/WordPress/wordpress-playground/pull/639

brandonpayton commented 6 months ago

@adamziel thanks for the pointer. I only ran into source map things in the Dockerfile after beginning to add DWARF info. Then I saw something that suggested the DWARF route would offer richer info while debugging. That promise hasn't panned out yet, but I'm working on it.

Thankfully, to address the missing source files, the debugging extension has a setting to map paths, designed for cases like ours (They even mentioned Docker-based builds : ). https://developer.chrome.com/blog/wasm-debugging-2020/#path-mapping

Here's what I'm seeing so far with the crash. Screenshot 2024-03-23 at 1 31 46 PM

We get the call stack which may not be super useful. But it does help us identify where the "Allowed memory size" error is occurring (there are 3-4 places in PHP 8.0 where that error is reported). Unfortunately, access to variable values is infrequent and rare, so I'm working on that. It's impressive that the reflected call stack includes the JS functions that invoked the Wasm export.

brandonpayton commented 6 months ago

I tried using the latest emscripten version to see whether using a newer version might lead to more complete debug info, but the variable values are still largely unavailable. One interesting thing is that, after building PHP 8.0 with latest emcc, the error changed to simply "Out of memory", which comes a bit later than the error seen above. It seems to indicate that the limit checking said everything was OK but the actual allocation failed. This might just be due to changing emscripten options/behavior leading to different preprocessor behavior during build (i.e., maybe the memory limit checking didn't run at all).

Either way, we don't have to have var values in the debugger right now. As a next step, I plan to patch the PHP 8.0 to augment the error message, including whatever info we want from within zend_mm_alloc_huge.

adamziel commented 6 months ago

I know it probably doesn't matter here, but I was curious and found this page suggesting that variable names in the inspector are possible: https://developer.chrome.com/docs/devtools/wasm

Also: https://www.perplexity.ai/search/Debug-wasm-variable-0L7fbRR3T0iM2LqSMnt39g

brandonpayton commented 6 months ago

The first page you mentioned is the one that made me think variable names and values were possible. And I was seeing variable names in the Scope panel, but all said "undefined".

But oddly enough, the values seem to be present now too. I'd quit Chrome Canary and was restarting to take a screenshot of the undefined vars, but now values are present! Screenshot 2024-03-23 at 9 13 59 PM

I'll see what I can learn from the values and then play with the configuration to see whether var values are possible with our current emscripten version.

brandonpayton commented 6 months ago

I switched back to emscripten 3.1.43 and am thankfully still seeing var values in the debugger. It works best when I let the page load Playground and open the dev tools after. If dev tools are opened before loading Playground for the first time, no php-src source files are listed.

Setting memory limit to 1024MB via ini_set('memory_limit', '1024M') results in the heap structure having a limit that reflects 1024MB, but regardless, the test script still hits an OOM condition:

Fatal error: Out of memory (allocated 87097344) at /root/php-src/Zend/zend_string.h:241 (tried to allocate 62914584 bytes) in /wordpress/index.php on line 10

Planning to dig into this more on Monday.

brandonpayton commented 6 months ago

Even without repeated str_repeat() calls, there is an OOM. It seems like there is some mis-management involving string concat.

The following version of the test script:

<?php

ini_set('memory_limit', '64M');
function useAllMemory() {
    echo "Initial memory limit: " . ini_get( 'memory_limit' ) . '<br>';
    echo "Initial memory usage: " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
    echo "<br>";

    $data = '';
    $counter = 0;
    $total_strlen = 0;
    $tail = str_repeat('a', 64 * 1024); // Increase string size by 64KB in each iteration
    for ($counter = 0; $counter < 1000; $counter++) {
      $data .= $tail;

      $usage = memory_get_usage();
      $strlen = strlen($data);
      $total_strlen += $strlen;
      echo "* iteration: $counter <br>";
      echo "* strlen(): " . ($strlen/(1024*1024)) . " MB <br>";
      echo "* memory_get_usage(): " . ($usage/(1024*1024)) . " MB <br>";
      echo "* aggregate strlen()'s: " . ($total_strlen/(1024*1024)) . " MB <br>";
      echo "<br>";
    }
}
useAllMemory();
echo "Completed without OOM<br>";
die();

This script printed the following results:

* iteration: 0
* strlen(): 0.0625 MB
* memory_get_usage(): 0.43998718261719 MB
* aggregate strlen()'s: 0.0625 MB

* iteration: 1
* strlen(): 0.125 MB
* memory_get_usage(): 0.56889343261719 MB
* aggregate strlen()'s: 0.1875 MB

* iteration: 2
* strlen(): 0.1875 MB
* memory_get_usage(): 0.63139343261719 MB
* aggregate strlen()'s: 0.375 MB

... iterations 3 - 220 ...

* iteration: 221
* strlen(): 13.875 MB
* memory_get_usage(): 14.377540588379 MB
* aggregate strlen()'s: 1547.0625 MB

* iteration: 222
* strlen(): 13.9375 MB
* memory_get_usage(): 14.440040588379 MB
* aggregate strlen()'s: 1561 MB

* iteration: 223
* strlen(): 14 MB
* memory_get_usage(): 14.502540588379 MB
* aggregate strlen()'s: 1575 MB

Fatal error: Out of memory (allocated 39911424) at /root/php-src/Zend/zend_string.h:241 (tried to allocate 14745624 bytes) in /wordpress/index.php on line 14

PHP reports just 14.5MB of memory used, yet somehow it is unable (or believes it is unable) to allocate more memory.

For next steps, I'm planning to:

Also, planning to read this article for ideas: Debugging memory leaks in WebAssembly using Emscripten

brandonpayton commented 6 months ago

Read how PHP string concat works within php-src

This appears to be the function PHP calls to make room for the concatenation. I've highlighted the line requesting allocation. It is part of the call stack where the OOM occurs, and it is the furthest up the stack I could get before getting into PHP opcode processing. The dev tools aren't showing which opcode, or I would share that here.

https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_string.h#L271

brandonpayton commented 6 months ago

And this is where PHP asks the OS for memory and detects a failure: https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_alloc.c#L530-L536

Now to see what Emscripten is doing with the mmap() call...

brandonpayton commented 6 months ago

And this is where PHP asks the OS for memory

NOTE: When OOM occurs, the debugger shows that the requested memory size is about 14MB.

brandonpayton commented 6 months ago

Memory allocated with mmap() is freed with munmap(). PHP mostly ignores failed munmap() calls, except that it logs to STDERR when ZEND_MM_ERROR is defined: https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_alloc.c#L448-L450

I went to define ZEND_MM_ERROR so we could see munmap() failures in the debugger and found this, which implies there were such failures before we silenced them:

# Silence the errors "munmap() failed: [28] Invalid argument"
# @TODO: Identify the root cause behind these errors and fix them properly
RUN echo '#define ZEND_MM_ERROR 0' >> /root/php-src/main/php_config.h;

from https://github.com/WordPress/wordpress-playground/blob/trunk/packages/php-wasm/compile/php/Dockerfile#L260-L262

Next: Confirm that these failures are occurring for this OOM and, if so, try to find out why.

brandonpayton commented 6 months ago

Confirm that these failures are occurring for this OOM

Yep. This is what I see in the console after setting ZEND_MM_ERROR=1. Screenshot 2024-03-25 at 9 29 34 PM

Some progress on getting visibility into Emscripten library behavior: It turns out that mmap() and munmap() are implemented in C in the Emscripten codebase, and when building with debug info, you can step into those calls as well after setting path mappings in the extension options: Screenshot 2024-03-25 at 10 42 43 PM

Stepping into munmap(): Screenshot 2024-03-25 at 10 41 54 PM

I haven't read or understood the implementation yet, but at least we can take a look. Planning to continue with this tomorrow.

Other notes:

  1. I briefly looked at whether we could skip using mmap() within PHP. So far, it doesn't look like there is a preprocessor flag for it, but if we really wanted to, PHP does appear to support providing your own custom memory storage API.

  2. This comment on another emscripten issue about failed munmap() says:

    Is the code in question trying to partial unmapping of anonymous regions?

    We don't support that, and I don't think there is any easy way to fake it. We should really be asserting in that case. We also don't support mprotect at all. We should assert there too.

PHP does appear to use anonymous regions, but I'm not yet sure whether PHP attempts partial unmapping. Something to check.

For tomorrow:

  1. Check whether PHP attempt partial unmapping of anonymous regions. If so, see if there's anything we can do to avoid it.
  2. Step through emscripten mmap() and munmap() implementations in the debugger, building understanding and looking for clues.
adamziel commented 6 months ago

This is top-notch work @brandonpayton!

brandonpayton commented 6 months ago

❤️ thanks, @adamziel!

Check whether PHP attempt partial unmapping of anonymous regions.

I instrumented PHP 8.3 to print the address and size of each mmap() and munmap() attempt, and PHP does appear to attempt partial unmapping.

From one of the test runs (full log here), we can see the first munmap() failure occurs when attempting to unmap with a smaller size than previously mapped memory at the same address.

* iteration: 31
mmap, 0x6320000, 2162688
munmap, 0x6320000, 2162688
mmap, 0x6320000, 4194304
munmap, 0x6320000, 917504
munmap() failed: [28] Invalid argument
munmap, 0x6610000, 1114112
munmap() failed: [28] Invalid argument
* strlen(): 2 MB
* memory_get_usage(): 2.5128479003906 MB
* aggregate strlen()'s: 33 MB

Specifically, this is the first munmap() failure():

mmap, 0x6320000, 4194304
munmap, 0x6320000, 917504
munmap() failed: [28] Invalid argument

And the second munmap() failure is an attempt to partially unmap within the same memory space allocated earlier at address 0x6320000 and continuing until address 0x6720000.

munmap, 0x6610000, 1114112
munmap() failed: [28] Invalid argument

Note that 0x6610000 + 1114112 = 0x6720000.

PHP is requesting a partial unmapping, something the emscripten libs apparently don't support. And when it fails, it does nothing (but does log to stderr if ZEND_MM_ERROR is truthy).

I don't yet know what we can do about this. Here are some questions we can pursue. Do any others come to mind?

Tomorrow, I plan to start by walking through more of php-src to see if I missed other memory allocation options and to see whether providing custom memory storage is a possible solution.

adamziel commented 6 months ago

Is it that support cannot be reasonably implemented or that it simply hasn't been added yet?

I’ve had a similar experience with a bunch of socket reading options and they were relatively easy to add with a patch. If we’re lucky, this would also be the case here.

brandonpayton commented 6 months ago

I’ve had a similar experience with a bunch of socket reading options and they were relatively easy to add with a patch.

That's good to hear! OK, I'll take a look at the Emscripten libs first. It would be ideal not to have to customize PHP behavior but rather have the underlying platform behave as expected.

brandonpayton commented 6 months ago

Emscripten mmap() and mmap() rely upon a generic API supported by multiple allocators:

// Direct access to the system allocator.  Use these to access that underlying
// allocator when intercepting/wrapping the allocator API.  Works with with both
// dlmalloc and emmalloc.
void *emscripten_builtin_memalign(size_t alignment, size_t size);
void *emscripten_builtin_malloc(size_t size);
void emscripten_builtin_free(void *ptr);

And emscripten simulates anonymous mmap() and munmap() using emscripten_builtin_memalign() and emscripten_builtin_free() respectively. This works because Emscripten does not currently support partial munmap() of anonymous mapped regions.

But it seems like unmapping support could be added to an allocator. An allocator tracks allocated and free regions, and AFAICT, all a partial unmapping should do is take an allocated region and break it into a free region and one or more allocated regions.

Conceptually partial unmapping can do one of four things:

At a high-level, this seems doable.

Doing it sounds like a lot of fun, but a simpler solution would be better.

I've spent a fair amount of time thinking about this issue, re-reading php-src code, and testing various ideas (e.g., playing with declared alignments in an attempt to avoid reasons PHP munmaps). So far, adding partial unmapping support is my only idea.

If we go ahead with this...

Our current allocator is dlmalloc, a generic malloc implementation from elsewhere that was adopted by Emscripten, and it tracks chunks using a combination of circular doubly-linked lists and a form of trees (specifically, tries). In our current version of Emscripten, its source file is ~6400 lines.

But Emscripten also has it's own minimalistic allocator called "emmalloc". It's billed as a "Simple minimalistic but efficient sbrk()-based malloc/free that works in singlethreaded and multithreaded builds.". It "subdivides regions of free space into distinct circular doubly linked lists, where each linked list represents a range of free space blocks", and its source file is around 1400 lines.

If we're going to add support for partial unmapping of anonymous regions, it is probably best to try first with the simpler allocator. I'm not sure which is the better allocator for PHP/WP performance, but for a PoC, it seems better to pick the simpler starting point. In very brief testing, Emmalloc seems fine when rendering a WP home page, and if the performance is comparable for the average case, it's also the smaller implementation.

brandonpayton commented 6 months ago

I've spent a fair amount of time thinking about this issue, re-reading php-src code, and testing various ideas (e.g., playing with declared alignments in an attempt to avoid reasons PHP munmaps). So far, adding partial unmapping support is my only idea.

If anyone has other ideas, please let me know. Also, it might be helpful to let this sit in our minds for a day or two before proceeding.

Adding Emscripten support for partial unmapping sounds like a lot of fun, but I have a nagging feeling that there ought to be a simpler way.

bgrgicak commented 6 months ago

Two reports of out-of-memory errors, but I wasn't able to recreate these errors.

adamziel commented 6 months ago

CC @ThomasTheDane @fgmccabe on any insights about munmap() support in Emscripten – PHP.wasm and Playground is unable to do partial unmapping and can run out of memory rather quickly, leading to OOM crashes for its users. Is there a 'quick fix' we could use to solve this? Or an Emscripten PR/planned release to fix that?

dmsnell commented 6 months ago

failed-munmap-calls drawio

there's a lack of explanation for this diagram, but I drew out iteration: 31

adamziel commented 6 months ago

@brandonpayton would a munmap() patch to the effect of "pretend freeing the memory worked so that the memory region can be reused later" work? Where would that have to be done in the libc version used by Emscripten? Or, being a syscall, would that be a JavaScript patch somewhere here?

adamziel commented 6 months ago

What @dmsnell just figured out is that, being unable to free the memory, PHP allocates the next memory region for the next string concat – and so on until it fills all the available memory.

brandonpayton commented 6 months ago

What @dmsnell just figured out is that, being unable to free the memory, PHP allocates the next memory region for the next string concat – and so on until it fills all the available memory.

Yep, that's it. PHP incorrectly assumes it has freed the memory and keeps asking for more. I thought that was already our shared understanding and probably should have been clearer. 😅

brandonpayton commented 6 months ago

PHP incorrectly assumes it has freed the memory and keeps asking for more.

And because PHP incorrectly assumes it has partially unmapped a mapped memory region, it is set to ask for further partial unmapping of the remaining memory, and those attempts will also fail.

brandonpayton commented 6 months ago

there's a lack of explanation for this diagram, but I drew out iteration: 31

@dmsnell Thank you for the diagram. It's helpful to have a visual instead of asking people to visualize in their mind. :)

Out of curiosity, what did you use to make the diagram?

brandonpayton commented 6 months ago

it seems like unmapping support could be added to an allocator. ... Doing it sounds like a lot of fun, but a simpler solution would be better. ... So far, adding partial unmapping support is my only idea.

Today, I traced through zend_alloc's use of mmap and munmap by hand, and I think we can probably avoid this issue if we provide our own Zend storage handlers. I don't know how this is done yet but would guess this could be done via PHP extension. The custom memory storage API has been present since at least PHP 7.0 and remains present in PHP 8.3.

https://github.com/php/php-src/blob/PHP-7.0/Zend/zend_alloc.h#L288-L308

The API supports four operations:

And truncate and extend return booleans to indicate success and failure. Based on today's reading, it appears PHP is able to cope when truncate and extend fail. Performance would likely be better if we could support truncate and extend, but at the moment we cannot.

I am planning to work on this before considering updating Emscripten allocators to support partial munmapping.

adamziel commented 6 months ago

Performance would likely be better if we could support truncate and extend, but at the moment we cannot.

Why?

I thought that was already our shared understanding and probably should have been clearer. 😅

Sorry, I think that's what you wrote earlier, I just tried to summarize the discussion we've had at breakfast - it sounds like it didn't add that much :-)

dmsnell commented 6 months ago

Out of curiosity, what did you use to make the diagram?

it's in draw.io, and there should be a copy of the diagram embedded in the PNG

brandonpayton commented 6 months ago

there should be a copy of the diagram embedded in the PNG

That's amazing! Quite handy.

brandonpayton commented 6 months ago

Performance would likely be better if we could support truncate and extend, but at the moment we cannot.

Why?

Do you mean "Why might truncate/extend help performance?" or "Why can we not support truncate/extend?" I'll go ahead and answer both.

Truncate and extend ops can help performance by modifying memory allocations in place rather than requiring new allocations. PHP also names the fallback functions with suffixes like _slow. For example, when truncate or extend don't succeed here, the fallback function is zend_mm_realloc_slow().

I do not believe we can support truncate/extend ops here for custom PHP memory storage because:

I thought that was already our shared understanding and probably should have been clearer. 😅

Sorry, I think that's what you wrote earlier, I just tried to summarize the discussion we've had at breakfast - it sounds like it didn't add that much :-)

ah, ok. It was nice for you to state it so briefly and clearly. 🙇 Afterward, I was fascinated by our apparent mismatch in understandings and resolved to be more aware of unspoken assumptions. And that's a nice reminder, even if there wasn't a misunderstanding. 😄

brandonpayton commented 6 months ago

@adamziel Sorry, somehow I missed these questions from yesterday.

@brandonpayton would a munmap() patch to the effect of "pretend freeing the memory worked so that the memory region can be reused later" work? Where would that have to be done in the libc version used by Emscripten?

If Emscripten pretends, PHP will just continue to believe that it freed memory and will continue making partial munmap() requests. And in that case, Emscripten would have to do extra tracking to know when multiple partial munmap()'s have freed a whole mapped range so it can actually be freed.

When using the debugger, here is where I've traced the syscall within Emscripten libc: https://github.com/emscripten-core/emscripten/blob/ec26789737bef27b353bd4c89981233aa6b9c698/system/lib/libc/emscripten_mmap.c#L56

Or, being a syscall, would that be a JavaScript patch somewhere here?

That's interesting! I don't know how this works. Is this JS hit for every syscall regardless of whether the call was made from Wasm or JS?

adamziel commented 6 months ago

@brandonpayton oh that's good work, nice! I'm not sure how that JS function is called honestly, but I suspect this line inside the C function is what calls it:

https://github.com/emscripten-core/emscripten/blob/ec26789737bef27b353bd4c89981233aa6b9c698/system/lib/libc/emscripten_mmap.c#L76

brandonpayton commented 6 months ago

I created this basic PHP extension as a hack to avoid calls to mmap() and munmap(). https://gist.github.com/brandonpayton/92f73592e6c554eee5b0fbefe0e1bb68

It seems to address the memory leak... I haven't measured in detail, but the test script can complete successfully now. It's still possible for there to be calls to mmap() before the extension is loaded, but I haven't seen any runtime errors due to that so far. Still, if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c to load this extension first. IIUC, this file is created during the config/build process, and we could likely adjust it before building.

To get this to work, I also had to adjust an memory alignment assertion that seemed wrong to me. This assertion was adjusted from

ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & (alignment-1)) == (zend_uintptr_t)ptr);

to

ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & ~(alignment-1)) == (zend_uintptr_t)ptr);

(note the negation added in the second version)

I feel like I must be missing something that makes the original assertion correct. It seems unlikely that such a bug would remain undiscovered or unfixed for as long as the assertion has been there. Yet, in my understanding and in doing the bitwise ops by hand, it looks like the original assertion actually rejects all aligned addresses while the latter assertion only accepts aligned addresses.

It would be good to have a second set of eyes on this. Does the original assertion make sense somehow?

dmsnell commented 6 months ago

I feel like I must be missing something that makes the original assertion correct. It seems unlikely that such a bug would remain undiscovered or unfixed for as long as the assertion has been there. Yet, in my understanding and in doing the bitwise ops by hand, it looks like the original assertion actually rejects all aligned addresses while the latter assertion only accepts aligned addresses.

It would be good to have a second set of eyes on this. Does the original assertion make sense somehow?

That code hasn't been touched since 2014 when it was introduced apart from a change of type alias.

It's possible nobody noticed. Can you think of a test case? if so, it would be valuable to post on the PHP internals mailing list

I wonder if this could help explain some oddities in that diagram I made. I couldn't make sense of why it was allocating in the middle of the string, copying the string, and freeing up the prefix and suffix of the allocated region. I spent a lot of time trying to understand why this was happening and eventually gave up, having exhausted myself.

ZEND_MM_CHUNK_SIZE is 2mb (2 1024 1024, 0x200000), which lines up with the allocations in your report.

Let's experiment…

Test Size ptr alignment - 1 ptr + (alignment - 1) & (alignment - 1)
Aligned 4mb 0x15000000 0x1FFFFF 0x151FFFFF 0x1FFFFF
Unaligned 4mb 0x15100000 0x1FFFFF 0x152FFFFF 0x0FFFFF
Aligned + negated 4mb 0x15000000 0x1FFFFF 0x151FFFFF 0x15000000
Unaligned + negated 4mb 0x15100000 0x1FFFFF 0x152FFFFF 0x15200000

If we're not both mistaken, it looks like you found something. Both with and without negating, either check could work, but the result would have to compare like with like and this appears to be comparing ptr against alignment

When I try to parse the logic, it's something like this: subtracting one from the alignment turns it into a kind of mask. if we have an aligned allocation then all of the bits within this mask should be zero, so adding the "mask" turns them all to one. if any bits aren't zero in that range, we'll carry into the allocation above the alignment. Taking the AND with that mask should thus reveal if we had a carry: if we did, the AND will return some value that isn't what it started with.

The problem here is that we're comparing that AND not against the alignment mask, but the original pointer. Flipping the bits selects the other part of the pointer, which also would have had to stay the same or be changed by carry.

adamziel commented 6 months ago

It seems to address the memory leak... I haven't measured in detail, but the test script can complete successfully now.

Yay! That gist looks good, thank you so much for figuring this out! I left a tactical comment on that gist, but the overall concept and implementation look solid. It's pretty amazing that this seemingly faulty assertion was in there for so long. It would be a good bug to report to PHP maintainers.

if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c to load this extension first.

Could that be avoided by assigning the storage in the module initialization callback, not the request initialization one? If not, then yeah let's patch that file in the Dockerfile 👍


As for the speed – is there a measurable difference in speed between this and the "original" PHP allocator?

If it's <= 20% slower then let's just start a PR, confirm the unit tests pass, merge, create a low-priority issue to track munmap() support, and that will be it.

If it's > 20% slower, that may be a problem – in that case let's regroup .

brandonpayton commented 6 months ago

@dmsnell, thank you for the confirmation and for breaking everything down nicely.

It's possible nobody noticed. Can you think of a test case? if so, it would be valuable to post on the PHP internals mailing list

👍 There is not a quick way to represent this with a simple PHP test case since it requires assigning a custom memory storage provider. But it would be trivial to share some C that shows the bitwise expression fails to recognize an address of 2 1024 1024 as aligned with itself.

I wonder if this could help explain some oddities in that diagram I made. I couldn't make sense of why it was allocating in the middle of the string, copying the string, and freeing up the prefix and suffix of the allocated region. I spent a lot of time trying to understand why this was happening and eventually gave up, having exhausted myself.

Based purely on reading and not yet on testing, I think I see how this could be happening. To communicate the theory, here is an annotated sample of iteration 31 from the test logs mentioned above.

All of the linked lines are in zend_mm_chunk_alloc_int() which is the only thing zend_mm_chunk_alloc() does or calls if there isn't a custom memory storage provider (which short circuits the default allocation behavior anyway).

The problem here is that we're comparing that AND not against the alignment mask, but the original pointer. Flipping the bits selects the other part of the pointer, which also would have had to stay the same or be changed by carry.

I'm not 100% confident I understand you properly. Do you mean that the following tweaked version of the assertion is still incorrect?

ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & ~(alignment-1)) == (zend_uintptr_t)ptr);

There may be more direct and efficient ways to check, but AFAICT, having the result equal the original pointer indicates that a carry did not take place, which indicates ptr is a multiple of the alignment. If any sub-alignment bits were set, there would have been a carry.

-- For tomorrow, I plan to do more testing with this workaround and then create a PR to add the wasm_memory_storage extension and patch the errant ZEND_ASSERT(). Since it hasn't changed in so long, we may be able to use a single patch for all supported PHP versions (until maybe the latest is fixed :).

brandonpayton commented 6 months ago

Thanks, @adamziel!

It's pretty amazing that this seemingly faulty assertion was in there for so long.

I agree and guess custom memory storage may not be used very often.

if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c to load this extension first.

Could that be avoided by assigning the storage in the module initialization callback, not the request initialization one? If not, then yeah let's patch that file in the Dockerfile 👍

Sounds good.

As for the speed – is there a measurable difference in speed between this and the "original" PHP allocator? If it's <= 20% slower then let's just start a PR, confirm the unit tests pass, merge, create a low-priority issue to track munmap() support, and that will be it. If it's > 20% slower, that may be a problem – in that case let's regroup .

In my local test, Playground did not feel slower with the custom memory storage, but I will see what we can measure of performance w/ dev tools. Other than using dev tools, is there another way you typically test Playground performance?

I guess another thing is that I could make some measurements using the same or another test script.

dmsnell commented 6 months ago

Do you mean that the following tweaked version of the assertion is still incorrect?

I meant that I think the test can be used to check either against the original pointer or against the alignment. Without the negation the original code returns the bits representing the alignment portion. With the negation the modified code returns the bits representing the original allocation.

In either case I believe we could use either one; all we have to do is make sure we're matching the negation with the right number.

brandonpayton commented 6 months ago

@dmsnell, thanks, I understand now.

Related to the map/munmap dance we just discussed, I wonder if PHP would do better to just use posix_memalign() instead. It seems like it would be much faster to ask for exactly what is wanted than to whittle it out of a bigger chunk of memory. I'll make a note to test that, apart from this issue.

adamziel commented 6 months ago

@brandonpayton I wonder if PHP didn't use that function for Windows compatibility? Regardless, we're good to use it in WASM.

Performance-wise, measuring the time it takes to serve a thousand HTTP requests or so would be interesting – there's plenty of allocations involved and it's the exact code path we care about.

In any case, this is lovely – do you think starting a PR today is possible?

brandonpayton commented 6 months ago

@brandonpayton I wonder if PHP didn't use that function for Windows compatibility? Regardless, we're good to use it in WASM.

@adamziel Yeah, I think that is probably the case.

Performance-wise, measuring the time it takes to serve a thousand HTTP requests or so would be interesting – there's plenty of allocations involved and it's the exact code path we care about.

Cool. That sounds good and should be pretty straightforward to run in a dev tools console.

In any case, this is lovely – do you think starting a PR today is possible?

Yes, I think so. I'll do that before perf testing so others can run it too.

adamziel commented 6 months ago

I had to revert #1189 unfortunately (in https://github.com/WordPress/wordpress-playground/pull/1195) because of the memory access out of bounds, see https://github.com/WordPress/wordpress-playground/pull/1194 for more details.

brandonpayton commented 6 months ago

There are some updates on our continued investigation here and here.

The state of this issue is: