Closed adamziel closed 6 months ago
Relevant unit test: https://github.com/WordPress/wordpress-playground/blob/06f27eb56ac72ba70bdca74f2ee30fb04597e93c/packages/php-wasm/node/src/test/php.spec.ts#L1158
A historical PR that added that test:
I checked a few quick ideas to give the next person some entrypoints and potentially save a few hours of looking for these:
I just quickly tested whether increasing the memory_limit
helps at all and it doesn't seem to:
http://localhost:5400/website-server/?url=/crash.php
diff --git a/packages/playground/remote/src/lib/worker-thread.ts b/packages/playground/remote/src/lib/worker-thread.ts
index cfc77fea2..06b78152d 100644
--- a/packages/playground/remote/src/lib/worker-thread.ts
+++ b/packages/playground/remote/src/lib/worker-thread.ts
@@ -241,6 +241,7 @@ const [setApiReady, setAPIError] = exposeAPI(
try {
php.initializeRuntime(await recreateRuntime());
+ php.setPhpIniEntry('memory_limit', '256M');
if (startupOptions.sapiName) {
await php.setSapiName(startupOptions.sapiName);
@@ -286,6 +287,20 @@ try {
'playground-includes/wp_http_dummy.php': transportDummy,
'playground-includes/wp_http_fetch.php': transportFetch,
});
+ php.writeFile('/wordpress/crash.php', `<?php
+ function useAllMemory() {
+ echo "Initial memory usage: " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
+ // ini_set('memory_limit', '1024MB'); // The php limit seems to be 128MB but it doesn't affect the results.
+ $data = '';
+
+ while (true) {
+ $data .= str_repeat('a', 1024 * 1024); // Increase string size by 1MB in each iteration
+ echo "* " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
+ }
+ }
+ useAllMemory();
+ die();`)
+
if (virtualOpfsDir) {
await bindOpfs({
Which is weird considering how:
and
I was, however, able to use a bit more memory before crashing by not extracting WordPress files into MEMFS – I think HEAP is used to store their content:
I tried two more things:
php[__private__dont__use].malloc(1024 * 1024 * 512);
and
Neither seemed to have a significant effect on the size of the string PHP is able to allocate:
* 49.415100097656 MB
* 50.415100097656 MB
* 51.415100097656 MB
* 52.415100097656 MB
(crash)
HEAP size is not an issue here, then. It must be that PHP thinks it runs out of memory when it actually doesn't.
@adamziel Thanks for doing all this work to help narrow the possibilities.
I am looking at whether we can just build PHP with debug info and step through it in the browser. This page suggests it is possible at least sometimes: https://developer.chrome.com/docs/devtools/wasm
So far, I've asked emscripten to build with debug info via the -g
flag, and php_8_0.wasm is now 21MB instead of ~5MB. So there is something more there. The OOM is still happening with this build which is good.
Now I'm looking at whether we can step through it in Chrome. The original source files may be required, and that might be a challenge since the build is done within Docker. We'll see.
The original source files may be required, and that might be a challenge since the build is done within Docker. We'll see.
With Chrome Canary, after tweaking the Dockerfile to add the -g
flag and disable optimization with -O0
, I've gotten the dev tools listing PHP's source files, but it is indeed missing actual file content since it is referring to the php-src path within the Docker build.
With some tweaks we should be able to fix this. Then we can hopefully step through the debugger to observe where and why PHP thinks it is out of memory.
@brandonpayton check -g3
and the SOURCE_MAPS option in the Dockerfile. Entry to the rabbit hole: https://github.com/WordPress/wordpress-playground/pull/639
@adamziel thanks for the pointer. I only ran into source map things in the Dockerfile after beginning to add DWARF info. Then I saw something that suggested the DWARF route would offer richer info while debugging. That promise hasn't panned out yet, but I'm working on it.
Thankfully, to address the missing source files, the debugging extension has a setting to map paths, designed for cases like ours (They even mentioned Docker-based builds : ). https://developer.chrome.com/blog/wasm-debugging-2020/#path-mapping
Here's what I'm seeing so far with the crash.
We get the call stack which may not be super useful. But it does help us identify where the "Allowed memory size" error is occurring (there are 3-4 places in PHP 8.0 where that error is reported). Unfortunately, access to variable values is infrequent and rare, so I'm working on that. It's impressive that the reflected call stack includes the JS functions that invoked the Wasm export.
I tried using the latest emscripten version to see whether using a newer version might lead to more complete debug info, but the variable values are still largely unavailable. One interesting thing is that, after building PHP 8.0 with latest emcc, the error changed to simply "Out of memory", which comes a bit later than the error seen above. It seems to indicate that the limit checking said everything was OK but the actual allocation failed. This might just be due to changing emscripten options/behavior leading to different preprocessor behavior during build (i.e., maybe the memory limit checking didn't run at all).
Either way, we don't have to have var values in the debugger right now. As a next step, I plan to patch the PHP 8.0 to augment the error message, including whatever info we want from within zend_mm_alloc_huge
.
I know it probably doesn't matter here, but I was curious and found this page suggesting that variable names in the inspector are possible: https://developer.chrome.com/docs/devtools/wasm
Also: https://www.perplexity.ai/search/Debug-wasm-variable-0L7fbRR3T0iM2LqSMnt39g
The first page you mentioned is the one that made me think variable names and values were possible. And I was seeing variable names in the Scope panel, but all said "undefined".
But oddly enough, the values seem to be present now too. I'd quit Chrome Canary and was restarting to take a screenshot of the undefined vars, but now values are present!
I'll see what I can learn from the values and then play with the configuration to see whether var values are possible with our current emscripten version.
I switched back to emscripten 3.1.43 and am thankfully still seeing var values in the debugger. It works best when I let the page load Playground and open the dev tools after. If dev tools are opened before loading Playground for the first time, no php-src source files are listed.
Setting memory limit to 1024MB via ini_set('memory_limit', '1024M')
results in the heap structure having a limit that reflects 1024MB, but regardless, the test script still hits an OOM condition:
Fatal error: Out of memory (allocated 87097344) at /root/php-src/Zend/zend_string.h:241 (tried to allocate 62914584 bytes) in /wordpress/index.php on line 10
Planning to dig into this more on Monday.
Even without repeated str_repeat()
calls, there is an OOM. It seems like there is some mis-management involving string concat.
The following version of the test script:
$data
once for 1000 iterations.<?php
ini_set('memory_limit', '64M');
function useAllMemory() {
echo "Initial memory limit: " . ini_get( 'memory_limit' ) . '<br>';
echo "Initial memory usage: " . (memory_get_usage()/(1024*1024)) . ' MB <br>';
echo "<br>";
$data = '';
$counter = 0;
$total_strlen = 0;
$tail = str_repeat('a', 64 * 1024); // Increase string size by 64KB in each iteration
for ($counter = 0; $counter < 1000; $counter++) {
$data .= $tail;
$usage = memory_get_usage();
$strlen = strlen($data);
$total_strlen += $strlen;
echo "* iteration: $counter <br>";
echo "* strlen(): " . ($strlen/(1024*1024)) . " MB <br>";
echo "* memory_get_usage(): " . ($usage/(1024*1024)) . " MB <br>";
echo "* aggregate strlen()'s: " . ($total_strlen/(1024*1024)) . " MB <br>";
echo "<br>";
}
}
useAllMemory();
echo "Completed without OOM<br>";
die();
This script printed the following results:
* iteration: 0
* strlen(): 0.0625 MB
* memory_get_usage(): 0.43998718261719 MB
* aggregate strlen()'s: 0.0625 MB
* iteration: 1
* strlen(): 0.125 MB
* memory_get_usage(): 0.56889343261719 MB
* aggregate strlen()'s: 0.1875 MB
* iteration: 2
* strlen(): 0.1875 MB
* memory_get_usage(): 0.63139343261719 MB
* aggregate strlen()'s: 0.375 MB
... iterations 3 - 220 ...
* iteration: 221
* strlen(): 13.875 MB
* memory_get_usage(): 14.377540588379 MB
* aggregate strlen()'s: 1547.0625 MB
* iteration: 222
* strlen(): 13.9375 MB
* memory_get_usage(): 14.440040588379 MB
* aggregate strlen()'s: 1561 MB
* iteration: 223
* strlen(): 14 MB
* memory_get_usage(): 14.502540588379 MB
* aggregate strlen()'s: 1575 MB
Fatal error: Out of memory (allocated 39911424) at /root/php-src/Zend/zend_string.h:241 (tried to allocate 14745624 bytes) in /wordpress/index.php on line 14
PHP reports just 14.5MB of memory used, yet somehow it is unable (or believes it is unable) to allocate more memory.
For next steps, I'm planning to:
Also, planning to read this article for ideas: Debugging memory leaks in WebAssembly using Emscripten
Read how PHP string concat works within php-src
This appears to be the function PHP calls to make room for the concatenation. I've highlighted the line requesting allocation. It is part of the call stack where the OOM occurs, and it is the furthest up the stack I could get before getting into PHP opcode processing. The dev tools aren't showing which opcode, or I would share that here.
https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_string.h#L271
And this is where PHP asks the OS for memory and detects a failure: https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_alloc.c#L530-L536
Now to see what Emscripten is doing with the mmap() call...
And this is where PHP asks the OS for memory
NOTE: When OOM occurs, the debugger shows that the requested memory size is about 14MB.
Memory allocated with mmap() is freed with munmap(). PHP mostly ignores failed munmap() calls, except that it logs to STDERR when ZEND_MM_ERROR is defined: https://github.com/php/php-src/blob/ff2359b62e999c38598db16e60c3cc6818477224/Zend/zend_alloc.c#L448-L450
I went to define ZEND_MM_ERROR so we could see munmap() failures in the debugger and found this, which implies there were such failures before we silenced them:
# Silence the errors "munmap() failed: [28] Invalid argument"
# @TODO: Identify the root cause behind these errors and fix them properly
RUN echo '#define ZEND_MM_ERROR 0' >> /root/php-src/main/php_config.h;
Next: Confirm that these failures are occurring for this OOM and, if so, try to find out why.
Confirm that these failures are occurring for this OOM
Yep. This is what I see in the console after setting ZEND_MM_ERROR=1.
Some progress on getting visibility into Emscripten library behavior: It turns out that mmap() and munmap() are implemented in C in the Emscripten codebase, and when building with debug info, you can step into those calls as well after setting path mappings in the extension options:
Stepping into munmap():
I haven't read or understood the implementation yet, but at least we can take a look. Planning to continue with this tomorrow.
I briefly looked at whether we could skip using mmap() within PHP. So far, it doesn't look like there is a preprocessor flag for it, but if we really wanted to, PHP does appear to support providing your own custom memory storage API.
This comment on another emscripten issue about failed munmap() says:
Is the code in question trying to partial unmapping of anonymous regions?
We don't support that, and I don't think there is any easy way to fake it. We should really be asserting in that case. We also don't support mprotect at all. We should assert there too.
PHP does appear to use anonymous regions, but I'm not yet sure whether PHP attempts partial unmapping. Something to check.
This is top-notch work @brandonpayton!
❤️ thanks, @adamziel!
Check whether PHP attempt partial unmapping of anonymous regions.
I instrumented PHP 8.3 to print the address and size of each mmap() and munmap() attempt, and PHP does appear to attempt partial unmapping.
From one of the test runs (full log here), we can see the first munmap() failure occurs when attempting to unmap with a smaller size than previously mapped memory at the same address.
* iteration: 31
mmap, 0x6320000, 2162688
munmap, 0x6320000, 2162688
mmap, 0x6320000, 4194304
munmap, 0x6320000, 917504
munmap() failed: [28] Invalid argument
munmap, 0x6610000, 1114112
munmap() failed: [28] Invalid argument
* strlen(): 2 MB
* memory_get_usage(): 2.5128479003906 MB
* aggregate strlen()'s: 33 MB
Specifically, this is the first munmap() failure():
mmap, 0x6320000, 4194304
munmap, 0x6320000, 917504
munmap() failed: [28] Invalid argument
And the second munmap() failure is an attempt to partially unmap within the same memory space allocated earlier at address 0x6320000 and continuing until address 0x6720000.
munmap, 0x6610000, 1114112
munmap() failed: [28] Invalid argument
Note that 0x6610000 + 1114112 = 0x6720000
.
PHP is requesting a partial unmapping, something the emscripten libs apparently don't support. And when it fails, it does nothing (but does log to stderr if ZEND_MM_ERROR is truthy).
I don't yet know what we can do about this. Here are some questions we can pursue. Do any others come to mind?
Tomorrow, I plan to start by walking through more of php-src to see if I missed other memory allocation options and to see whether providing custom memory storage is a possible solution.
Is it that support cannot be reasonably implemented or that it simply hasn't been added yet?
I’ve had a similar experience with a bunch of socket reading options and they were relatively easy to add with a patch. If we’re lucky, this would also be the case here.
I’ve had a similar experience with a bunch of socket reading options and they were relatively easy to add with a patch.
That's good to hear! OK, I'll take a look at the Emscripten libs first. It would be ideal not to have to customize PHP behavior but rather have the underlying platform behave as expected.
Emscripten mmap() and mmap() rely upon a generic API supported by multiple allocators:
// Direct access to the system allocator. Use these to access that underlying
// allocator when intercepting/wrapping the allocator API. Works with with both
// dlmalloc and emmalloc.
void *emscripten_builtin_memalign(size_t alignment, size_t size);
void *emscripten_builtin_malloc(size_t size);
void emscripten_builtin_free(void *ptr);
And emscripten simulates anonymous mmap() and munmap() using emscripten_builtin_memalign()
and emscripten_builtin_free()
respectively. This works because Emscripten does not currently support partial munmap() of anonymous mapped regions.
But it seems like unmapping support could be added to an allocator. An allocator tracks allocated and free regions, and AFAICT, all a partial unmapping should do is take an allocated region and break it into a free region and one or more allocated regions.
Conceptually partial unmapping can do one of four things:
At a high-level, this seems doable.
Doing it sounds like a lot of fun, but a simpler solution would be better.
I've spent a fair amount of time thinking about this issue, re-reading php-src code, and testing various ideas (e.g., playing with declared alignments in an attempt to avoid reasons PHP munmaps). So far, adding partial unmapping support is my only idea.
If we go ahead with this...
Our current allocator is dlmalloc, a generic malloc implementation from elsewhere that was adopted by Emscripten, and it tracks chunks using a combination of circular doubly-linked lists and a form of trees (specifically, tries). In our current version of Emscripten, its source file is ~6400 lines.
But Emscripten also has it's own minimalistic allocator called "emmalloc". It's billed as a "Simple minimalistic but efficient sbrk()-based malloc/free that works in singlethreaded and multithreaded builds.". It "subdivides regions of free space into distinct circular doubly linked lists, where each linked list represents a range of free space blocks", and its source file is around 1400 lines.
If we're going to add support for partial unmapping of anonymous regions, it is probably best to try first with the simpler allocator. I'm not sure which is the better allocator for PHP/WP performance, but for a PoC, it seems better to pick the simpler starting point. In very brief testing, Emmalloc seems fine when rendering a WP home page, and if the performance is comparable for the average case, it's also the smaller implementation.
I've spent a fair amount of time thinking about this issue, re-reading php-src code, and testing various ideas (e.g., playing with declared alignments in an attempt to avoid reasons PHP munmaps). So far, adding partial unmapping support is my only idea.
If anyone has other ideas, please let me know. Also, it might be helpful to let this sit in our minds for a day or two before proceeding.
Adding Emscripten support for partial unmapping sounds like a lot of fun, but I have a nagging feeling that there ought to be a simpler way.
Two reports of out-of-memory errors, but I wasn't able to recreate these errors.
CC @ThomasTheDane @fgmccabe on any insights about munmap() support in Emscripten – PHP.wasm and Playground is unable to do partial unmapping and can run out of memory rather quickly, leading to OOM crashes for its users. Is there a 'quick fix' we could use to solve this? Or an Emscripten PR/planned release to fix that?
there's a lack of explanation for this diagram, but I drew out iteration: 31
@brandonpayton would a munmap()
patch to the effect of "pretend freeing the memory worked so that the memory region can be reused later" work? Where would that have to be done in the libc
version used by Emscripten? Or, being a syscall, would that be a JavaScript patch somewhere here?
What @dmsnell just figured out is that, being unable to free the memory, PHP allocates the next memory region for the next string concat – and so on until it fills all the available memory.
What @dmsnell just figured out is that, being unable to free the memory, PHP allocates the next memory region for the next string concat – and so on until it fills all the available memory.
Yep, that's it. PHP incorrectly assumes it has freed the memory and keeps asking for more. I thought that was already our shared understanding and probably should have been clearer. 😅
PHP incorrectly assumes it has freed the memory and keeps asking for more.
And because PHP incorrectly assumes it has partially unmapped a mapped memory region, it is set to ask for further partial unmapping of the remaining memory, and those attempts will also fail.
there's a lack of explanation for this diagram, but I drew out iteration: 31
@dmsnell Thank you for the diagram. It's helpful to have a visual instead of asking people to visualize in their mind. :)
Out of curiosity, what did you use to make the diagram?
it seems like unmapping support could be added to an allocator. ... Doing it sounds like a lot of fun, but a simpler solution would be better. ... So far, adding partial unmapping support is my only idea.
Today, I traced through zend_alloc's use of mmap and munmap by hand, and I think we can probably avoid this issue if we provide our own Zend storage handlers. I don't know how this is done yet but would guess this could be done via PHP extension. The custom memory storage API has been present since at least PHP 7.0 and remains present in PHP 8.3.
https://github.com/php/php-src/blob/PHP-7.0/Zend/zend_alloc.h#L288-L308
The API supports four operations:
And truncate and extend return booleans to indicate success and failure. Based on today's reading, it appears PHP is able to cope when truncate and extend fail. Performance would likely be better if we could support truncate and extend, but at the moment we cannot.
I am planning to work on this before considering updating Emscripten allocators to support partial munmapping.
Performance would likely be better if we could support truncate and extend, but at the moment we cannot.
Why?
I thought that was already our shared understanding and probably should have been clearer. 😅
Sorry, I think that's what you wrote earlier, I just tried to summarize the discussion we've had at breakfast - it sounds like it didn't add that much :-)
Out of curiosity, what did you use to make the diagram?
it's in draw.io, and there should be a copy of the diagram embedded in the PNG
there should be a copy of the diagram embedded in the PNG
That's amazing! Quite handy.
Performance would likely be better if we could support truncate and extend, but at the moment we cannot.
Why?
Do you mean "Why might truncate/extend help performance?" or "Why can we not support truncate/extend?" I'll go ahead and answer both.
Truncate and extend ops can help performance by modifying memory allocations in place rather than requiring new allocations. PHP also names the fallback functions with suffixes like _slow
. For example, when truncate or extend don't succeed here, the fallback function is zend_mm_realloc_slow()
.
I do not believe we can support truncate/extend ops here for custom PHP memory storage because:
I thought that was already our shared understanding and probably should have been clearer. 😅
Sorry, I think that's what you wrote earlier, I just tried to summarize the discussion we've had at breakfast - it sounds like it didn't add that much :-)
ah, ok. It was nice for you to state it so briefly and clearly. 🙇 Afterward, I was fascinated by our apparent mismatch in understandings and resolved to be more aware of unspoken assumptions. And that's a nice reminder, even if there wasn't a misunderstanding. 😄
@adamziel Sorry, somehow I missed these questions from yesterday.
@brandonpayton would a munmap() patch to the effect of "pretend freeing the memory worked so that the memory region can be reused later" work? Where would that have to be done in the libc version used by Emscripten?
If Emscripten pretends, PHP will just continue to believe that it freed memory and will continue making partial munmap() requests. And in that case, Emscripten would have to do extra tracking to know when multiple partial munmap()'s have freed a whole mapped range so it can actually be freed.
When using the debugger, here is where I've traced the syscall within Emscripten libc: https://github.com/emscripten-core/emscripten/blob/ec26789737bef27b353bd4c89981233aa6b9c698/system/lib/libc/emscripten_mmap.c#L56
Or, being a syscall, would that be a JavaScript patch somewhere here?
That's interesting! I don't know how this works. Is this JS hit for every syscall regardless of whether the call was made from Wasm or JS?
@brandonpayton oh that's good work, nice! I'm not sure how that JS function is called honestly, but I suspect this line inside the C function is what calls it:
I created this basic PHP extension as a hack to avoid calls to mmap() and munmap(). https://gist.github.com/brandonpayton/92f73592e6c554eee5b0fbefe0e1bb68
It seems to address the memory leak... I haven't measured in detail, but the test script can complete successfully now. It's still possible for there to be calls to mmap() before the extension is loaded, but I haven't seen any runtime errors due to that so far. Still, if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c
to load this extension first. IIUC, this file is created during the config/build process, and we could likely adjust it before building.
To get this to work, I also had to adjust an memory alignment assertion that seemed wrong to me. This assertion was adjusted from
ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & (alignment-1)) == (zend_uintptr_t)ptr);
to
ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & ~(alignment-1)) == (zend_uintptr_t)ptr);
(note the negation added in the second version)
I feel like I must be missing something that makes the original assertion correct. It seems unlikely that such a bug would remain undiscovered or unfixed for as long as the assertion has been there. Yet, in my understanding and in doing the bitwise ops by hand, it looks like the original assertion actually rejects all aligned addresses while the latter assertion only accepts aligned addresses.
It would be good to have a second set of eyes on this. Does the original assertion make sense somehow?
I feel like I must be missing something that makes the original assertion correct. It seems unlikely that such a bug would remain undiscovered or unfixed for as long as the assertion has been there. Yet, in my understanding and in doing the bitwise ops by hand, it looks like the original assertion actually rejects all aligned addresses while the latter assertion only accepts aligned addresses.
It would be good to have a second set of eyes on this. Does the original assertion make sense somehow?
That code hasn't been touched since 2014 when it was introduced apart from a change of type alias.
It's possible nobody noticed. Can you think of a test case? if so, it would be valuable to post on the PHP internals mailing list
I wonder if this could help explain some oddities in that diagram I made. I couldn't make sense of why it was allocating in the middle of the string, copying the string, and freeing up the prefix and suffix of the allocated region. I spent a lot of time trying to understand why this was happening and eventually gave up, having exhausted myself.
ZEND_MM_CHUNK_SIZE
is 2mb (2 1024 1024, 0x200000), which lines up with the allocations in your report.
Let's experiment…
Test | Size | ptr |
alignment - 1 |
ptr + (alignment - 1) |
& (alignment - 1) |
---|---|---|---|---|---|
Aligned | 4mb | 0x15000000 | 0x1FFFFF | 0x151FFFFF | 0x1FFFFF |
Unaligned | 4mb | 0x15100000 | 0x1FFFFF | 0x152FFFFF | 0x0FFFFF |
Aligned + negated | 4mb | 0x15000000 | 0x1FFFFF | 0x151FFFFF | 0x15000000 |
Unaligned + negated | 4mb | 0x15100000 | 0x1FFFFF | 0x152FFFFF | 0x15200000 |
If we're not both mistaken, it looks like you found something. Both with and without negating, either check could work, but the result would have to compare like with like and this appears to be comparing ptr
against alignment
When I try to parse the logic, it's something like this: subtracting one from the alignment turns it into a kind of mask. if we have an aligned allocation then all of the bits within this mask should be zero, so adding the "mask" turns them all to one. if any bits aren't zero in that range, we'll carry into the allocation above the alignment. Taking the AND with that mask should thus reveal if we had a carry: if we did, the AND will return some value that isn't what it started with.
The problem here is that we're comparing that AND not against the alignment mask, but the original pointer. Flipping the bits selects the other part of the pointer, which also would have had to stay the same or be changed by carry.
It seems to address the memory leak... I haven't measured in detail, but the test script can complete successfully now.
Yay! That gist looks good, thank you so much for figuring this out! I left a tactical comment on that gist, but the overall concept and implementation look solid. It's pretty amazing that this seemingly faulty assertion was in there for so long. It would be a good bug to report to PHP maintainers.
if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c to load this extension first.
Could that be avoided by assigning the storage in the module initialization callback, not the request initialization one? If not, then yeah let's patch that file in the Dockerfile 👍
As for the speed – is there a measurable difference in speed between this and the "original" PHP allocator?
If it's <= 20% slower then let's just start a PR, confirm the unit tests pass, merge, create a low-priority issue to track munmap()
support, and that will be it.
If it's > 20% slower, that may be a problem – in that case let's regroup .
@dmsnell, thank you for the confirmation and for breaking everything down nicely.
It's possible nobody noticed. Can you think of a test case? if so, it would be valuable to post on the PHP internals mailing list
👍 There is not a quick way to represent this with a simple PHP test case since it requires assigning a custom memory storage provider. But it would be trivial to share some C that shows the bitwise expression fails to recognize an address of 2 1024 1024 as aligned with itself.
I wonder if this could help explain some oddities in that diagram I made. I couldn't make sense of why it was allocating in the middle of the string, copying the string, and freeing up the prefix and suffix of the allocated region. I spent a lot of time trying to understand why this was happening and eventually gave up, having exhausted myself.
Based purely on reading and not yet on testing, I think I see how this could be happening. To communicate the theory, here is an annotated sample of iteration 31 from the test logs mentioned above.
size + alignment
bytes which is sure to contain an aligned address followed by at least size
allocated bytes. Here.All of the linked lines are in zend_mm_chunk_alloc_int() which is the only thing zend_mm_chunk_alloc() does or calls if there isn't a custom memory storage provider (which short circuits the default allocation behavior anyway).
The problem here is that we're comparing that AND not against the alignment mask, but the original pointer. Flipping the bits selects the other part of the pointer, which also would have had to stay the same or be changed by carry.
I'm not 100% confident I understand you properly. Do you mean that the following tweaked version of the assertion is still incorrect?
ZEND_ASSERT(((zend_uintptr_t)((char*)ptr + (alignment-1)) & ~(alignment-1)) == (zend_uintptr_t)ptr);
There may be more direct and efficient ways to check, but AFAICT, having the result equal the original pointer indicates that a carry did not take place, which indicates ptr is a multiple of the alignment. If any sub-alignment bits were set, there would have been a carry.
-- For tomorrow, I plan to do more testing with this workaround and then create a PR to add the wasm_memory_storage extension and patch the errant ZEND_ASSERT(). Since it hasn't changed in so long, we may be able to use a single patch for all supported PHP versions (until maybe the latest is fixed :).
Thanks, @adamziel!
It's pretty amazing that this seemingly faulty assertion was in there for so long.
I agree and guess custom memory storage may not be used very often.
if we end up using this workaround, we will probably want to modify php-src/main/internal_functions.c to load this extension first.
Could that be avoided by assigning the storage in the module initialization callback, not the request initialization one? If not, then yeah let's patch that file in the Dockerfile 👍
Sounds good.
As for the speed – is there a measurable difference in speed between this and the "original" PHP allocator? If it's <= 20% slower then let's just start a PR, confirm the unit tests pass, merge, create a low-priority issue to track
munmap()
support, and that will be it. If it's > 20% slower, that may be a problem – in that case let's regroup .
In my local test, Playground did not feel slower with the custom memory storage, but I will see what we can measure of performance w/ dev tools. Other than using dev tools, is there another way you typically test Playground performance?
I guess another thing is that I could make some measurements using the same or another test script.
Do you mean that the following tweaked version of the assertion is still incorrect?
I meant that I think the test can be used to check either against the original pointer or against the alignment. Without the negation the original code returns the bits representing the alignment portion. With the negation the modified code returns the bits representing the original allocation.
In either case I believe we could use either one; all we have to do is make sure we're matching the negation with the right number.
@dmsnell, thanks, I understand now.
Related to the map/munmap dance we just discussed, I wonder if PHP would do better to just use posix_memalign() instead. It seems like it would be much faster to ask for exactly what is wanted than to whittle it out of a bigger chunk of memory. I'll make a note to test that, apart from this issue.
@brandonpayton I wonder if PHP didn't use that function for Windows compatibility? Regardless, we're good to use it in WASM.
Performance-wise, measuring the time it takes to serve a thousand HTTP requests or so would be interesting – there's plenty of allocations involved and it's the exact code path we care about.
In any case, this is lovely – do you think starting a PR today is possible?
@brandonpayton I wonder if PHP didn't use that function for Windows compatibility? Regardless, we're good to use it in WASM.
@adamziel Yeah, I think that is probably the case.
Performance-wise, measuring the time it takes to serve a thousand HTTP requests or so would be interesting – there's plenty of allocations involved and it's the exact code path we care about.
Cool. That sounds good and should be pretty straightforward to run in a dev tools console.
In any case, this is lovely – do you think starting a PR today is possible?
Yes, I think so. I'll do that before perf testing so others can run it too.
I had to revert #1189 unfortunately (in https://github.com/WordPress/wordpress-playground/pull/1195) because of the memory access out of bounds, see https://github.com/WordPress/wordpress-playground/pull/1194 for more details.
There are some updates on our continued investigation here and here.
The state of this issue is:
The issue below was reported by @sejas. It negatively impacts user experience of Playground and
wp-now
. Let's make sure:php.ini
defaults to, say, 1GB memory.php_wasm.c
doesn't contribute to the memory leak problem. PHP itself suffers from one, see pm.max_requests. Is php_wasm.c making it worse?A few memory leaks were already patched in this repo, find old PRs for more context.
What @sejas reported:
I've created a PHP function to make it much easier to benchmark the memory usage: I'm currently adding it to the index.php, but you can also try the plugin in Studio or wp-now.
Here are my results:
Observe that the site screenshot displays almost 60MB of maximum memory. The next (2nd) page load displays 29MB The third page load 4.4MB and then 2.4 MB and 1.3 MB
Screenshot:
cc @brandonpayton