Closed adamziel closed 1 year ago
@gziolo captured this issue on the video:
https://user-images.githubusercontent.com/205419/191630406-8e30f3f0-b92d-454e-8ae5-338a737645b2.mov
I upgraded PHP to 8.0.24 in https://github.com/adamziel/wordpress-wasm/commit/45ba8cefc1c72422cfe69e07c68f8656b2d60da0 and https://github.com/adamziel/wordpress-wasm/commit/9d04a113a97f53a1303fefba642f65f1b5a5a5f6 in hopes it would solve this. No crashes so far, but it didn't have too much testing yet.
Here's Birgit's videos capturing the crash:
https://recordit.co/0gMsH3GDn1 https://recordit.co/HOc1D5IfuD
I just reproduced the crash :( It says error code 5
which seems to be a generic Chrome runtime error:
Interestingly, I can't get it to crash in Firefox or Safari. Perhaps we're hitting a Chrome bug?
Here's an idea: compile Chromium with debug=true
, attach a debugger, and inspect the crash:
Conveniently, on Ubuntu the debug symbols are shipped as a part of the chromium-browser
package. It doesn't seem to be the case for the Mac package, though.
Even better – instead of attaching the debugger, inspect the minidump file generated by chromium on crash:
I debugged Chromium today and added my findings to this issue's description: https://github.com/WordPress/wordpress-wasm/issues/1#issue-1354256184
CC @jsnajdr and @dmsnell – you might enjoy exploring this challenge with me. Also cc @swissspidy in case you know anyone in Chromium team who might be willing to take a look at this.
Chrome debugging aside, the php-wasm playground seems to just work and never break – I wonder why is that. One difference I see is they don't use web workers and just load everything in the main thread.
I think I'm getting somewhere – when PHP is running in the main thread and not in a webworker, it never seems to crash. I wonder if this is related to wasm at all, or is it just some inter-process issue with web workers.
@adamziel I have no idea, but if it's related to a WebWorker
then we might start looking for memory transfers that could involve non-transferrable objects. are we pinned to running it in workers? or is it easy enough to have it built as a single-threaded beast?
are we pinned to running it in workers? or is it easy enough to have it built as a single-threaded beast?
@dmsnell I started with a single-threaded setup, but it was super slow: every request took forever to load AND your interactions with the page were blocked while it was being loaded. My hypothesis is that the speed penalty was due to forcing Chrome to switch contexts between rendering and handling WASM.
I'm exploring a minimal reproducible crash scenario. Turns out, a web worker like this is all it takes:
console.log( '[WebWorker] Spawned' );
importScripts( '/webworker-php.js' ); // Generated by emscripten
new PHP( {} ) // PHP is the generated module
.then( () => {
console.log( '[WebWorker] PHP initialized' );
} );
Note I'm not transferring anything between threads explicitly, although something could be happening implicitly.
It could be something super specific deep in the emscripten's loading setup. I'll give it an hour more or so, but if I can't pinpoint the issue then I'll explore emulating a webworker with an iframe.
The crash happens when instantiating WebAssembly
in a web worker.
Booting WordPress WASM goes through the following emscripten-generated code path:
fetch( wasmBinaryFile, { credentials: 'same-origin' } ).then( function(
response,
) {
const result = WebAssembly.instantiateStreaming( response, info );
return result.then( receiveInstantiatedSource, function( reason ) {
err( 'wasm streaming compile failed: ' + reason );
err( 'falling back to ArrayBuffer instantiation' );
return instantiateArrayBuffer( receiveInstantiatedSource );
} );
} );
The crash still occurs after commenting receiveInstantiatedSource
out. The culprit is the WebAssembly.instantiateStreaming()
:
// Crashes
WebAssembly.instantiateStreaming(
await fetch( wasmBinaryFile ),
info,
);
// Regular instantiate() crashes as well
fetch( wasmBinaryFile ).then( async ( response ) => {
WebAssembly.instantiate(
await response.arrayBuffer(),
info,
);
} );
Here's an exploratory branch where the crash is being boiled down to its essence. Feel free to clone and help with this one.
So how to fix it? Here's a few ideas:
WebAssembly.instantiate
WebAssembly.instantiate
Accordingly to this StackOverflow answer, most major browsers seem to run iframes in a separate thread as long as it comes from a different domain. That's not a part of any spec, though. Just an implementation detail that may change at any time.
If the best solution still incurs a speed penalty it could become a chrome-only fallback.
It must be specific to the php wasm build. I just built a minimal .wasm from the following c
file using all the same emscripten options as for PHP, and it just won't crash:
#include <emscripten.h>
#include <stdlib.h>
int main() { return 0; }
int EMSCRIPTEN_KEEPALIVE test()
{
return 10;
}
A few ideas where to go from here:
emcc
switches that will produce a stable wasm binary, e.g. -O0
to disable optimizations.It would also be great to prepare an isolated branch with a minimal reproduction example and loop in the chrome team.
Edit: I compiled just libphp
without pib_eval
and the crash is still happening.
Here's what else I just tried without success:
-O0
to disable optimizations for both pib_eval.o
and the final WASM--llvm-lto 2
(link time optimizations)-s MAXIMUM_MEMORY=2048MB
-sASSERTIONS=1
./configure
-ing PHP: --enable-embed=static \
--with-layout=GNU \
--disable-cgi \
--disable-cli \
--disable-all \
--without-sqlite3 \
--disable-session \
--disable-filter \
--disable-calendar \
--disable-dom \
--disable-pdo \
--without-pdo-sqlite \
--disable-rpath \
--disable-phpdbg \
--without-pear \
--with-valgrind=no \
--without-pcre-jit \
--disable-bcmath \
--disable-json \
--disable-ctype \
--disable-mbstring \
--disable-mbregex \
--disable-tokenizer \
--disable-xml \
--disable-simplexml \
--without-gd
A combination of the following configure
and emcc
yielded a crashless wasm binary:
RUN cd php-src/ && PKG_CONFIG_PATH=$PKG_CONFIG_PATH emconfigure ./configure \
PKG_CONFIG_PATH=$PKG_CONFIG_PATH \
--enable-embed=static \
--with-layout=GNU \
--disable-all \
--without-sqlite3 \
--without-zlib \
--disable-session \
--disable-filter \
--disable-calendar \
--disable-dom \
--disable-pdo \
--without-pdo-sqlite \
--without-tsrm-pthreads \
--disable-rpath \
--disable-phpdbg \
--without-pear \
--without-pcre-jit \
--disable-bcmath \
--disable-shared \
--disable-libgcc \
--disable-rpath \
--disable-static \
--without-gnu-ld \
--disable-cli \
--disable-cgi \
--disable-phpdbg \
--without-servlet \
--disable-json \
--disable-ctype \
--disable-mbstring \
--disable-mbregex \
--disable-tokenizer \
--disable-xml \
--disable-simplexml \
--without-gd
docker run \
-v `pwd`/preload:/preload \
-v `pwd`/docker-output:/output \
wasm-wordpress-php-builder:latest \
emcc \
-o /output/webworker-php.js \
-s EXPORTED_FUNCTIONS='["_zend_eval_string"]' \
-s MAXIMUM_MEMORY=-1 \
-s INITIAL_MEMORY=1024MB \
-s ALLOW_MEMORY_GROWTH=1 \
-s ASSERTIONS=1 \
-s ERROR_ON_UNDEFINED_SYMBOLS=0 \
-s EXPORT_NAME="'PHP'" \
-s MODULARIZE=1 \
-s INVOKE_RUN=0 \
/root/lib/libphp7.a \
-s ENVIRONMENT=worker
Importantly, it's only 711KB instead of 10MB – most of the symbols must have been optimized away. That's good progress, though, the build process with and without a crash is clear – it should be possible to zero-in on the cause.
produce a stable wasm binary, e.g. -O0 to disable optimizations
I don't think optimization is going to impact the stability of the build. in fact, if you try -Os
it could have a positive impact if you suspect memory limits are in play.
there's another --enable-debug
flag you can use when configuring PHP that could be worth a try, though I doubt it will produce any more meaningful errors or output when this crashes.
what about trying to run this in node
with node:worker_threads
? maybe if we can get it to crash there it would send a better error?
The crash is caused by the wasm-ized PHP function called $_lex_scan
!
I played with the PHP C code for a day, but couldn't identify the root cause of the crash. I did however reduce the crashing binary size from 10 MB to 600 KB. From there, I was able to convert it to a WAT text format using wasm2wat
and remove parts of the assembly using an ad-hoc python script. I am currently ripping out instructions from a "minimal" 122KB WASM file to see at which point it will stop crashing.
what about trying to run this in node with node:worker_threads? maybe if we can get it to crash there it would send a better error?
@dmsnell unfortunately I couldn't cause the crash with node:worker_threads
– it seems to be specific to Chrome :-(
Oh I forgot to mention – that "minimal" 122KB file is literally just the $_lex_scan
function and some globals – I removed everything else from it. I will upload it here later.
this is good work; I hope it leads to fixes if there's a problem with the WASM runtime
Here's the .wat file I promised earlier:
Everything you have to do to cause the crash is compile it towasm
via wat2wasm
and run the snippet below in a Webworker:
(() => {
// src/web/web-worker.js
console.log("[WebWorker] Spawned");
var wasmTable = new WebAssembly.Table({
initial: 1090,
maximum: 1090,
element: "anyfunc"
});
var WASM_PAGE_SIZE = 65536;
var INITIAL_INITIAL_MEMORY = 1073741824;
var wasmMemory = new WebAssembly.Memory({
initial: INITIAL_INITIAL_MEMORY / WASM_PAGE_SIZE
});
var info = {
env: {
_zend_empty_array2: 1,
tempDoublePtr: 2303696,
"__memory_base": 1024,
__table_base: 0,
memory: wasmMemory,
table: wasmTable
},
global: { NaN: NaN, Infinity: Infinity },
asm2wasm: {
"f64-rem"() {
}
}
};
fetch("updated.wasm").then(async (response) => {
WebAssembly.instantiate(
await response.arrayBuffer(),
info
).then(() => {
console.log("Instantiated!");
});
console.log("Called instantiate");
});
console.log("Called fetch", { info });
})();
In the meantime, I keep reducing it further to isolate the specific part causing the problem.
I filed a Chromium issue (it seems to be private). Minidump/crash ID: dfe74270-7a4e-4145-9859-e77979a8145d
I got it down to 2.6 M, see minimal_repro.zip. To try, host it locally e.g. via php -S localhost:8080
and navigate to index.html
with devtools open and refresh a few dozen times.
Reducing the .wat
file won't move this issue any further. The more lines I remove, the less likely Chrome is to crash and the 12mb bundle crashes much more often than the small one. I hoped this crash would be caused by some specific code block, but now I'm thinking this is a complex systemic that can be solved by simplifying any part of a complex system.
How complex? Well, this is how that function starts:
Here's the current situation:
I don't know how else to zero-in on the root cause of the crash, so I'll try to work around it without understanding it well. I can only thing of two ways:
.wasm
file that doesn't crash. I've already tried that without knowing which function triggers the crash, but now I can at least compare the output. I will try -Oz
to reduce the code size at the expense of speed and --proxy-to-worker
hoping this would affect how the .wasm
file is built. Edit: Neither has helpedRunning WebAseembly inside of an iframe sourced from another domain is decently fast and does not crash Chrome!
The downside is a multi-domain setup which takes much more work than just adding a <script src="">
to head.
I will update trunk to use iframes instead of webworkers and leave solving the crash to the Chrome team. See my previous comment for the details of the related bugs.chromium.org issue.
For posterity, this seems to be an Out of Memory problem in v8 – it's weird it never occurs in node.js. The Chromium team shared the following stack trace generated on crash:
Magic Signature >> [Out of Memory] v8::internal::Zone::NewExpand
Stack Trace >>
Thread 26 ThreadPoolForegroundWorker (id: 0x005aad74)crashedMAGIC SIGNATURE THREADcontent_copy
0x00000001211dbf58(Google Chrome Framework -oom.cc:58)partition_alloc::internal::OnNoMemoryInternal(unsigned long)
0x00000001211dbf68(Google Chrome Framework -oom.cc:65)partition_alloc::TerminateBecauseOutOfMemory(unsigned long)
0x00000001211dbf85(Google Chrome Framework -oom.cc:75)partition_alloc::internal::OnNoMemory(unsigned long)
0x00000001246e95b2(Google Chrome Framework -partitions.cc:323)WTF::PartitionsOutOfMemoryUsing512M(unsigned long)
0x00000001246e948c(Google Chrome Framework -partitions.cc:448)WTF::Partitions::HandleOutOfMemory(unsigned long)
0x00000001211dd8b3(Google Chrome Framework -partition_root.cc:619)partition_alloc::PartitionRoot<true>::OutOfMemory(unsigned long)
0x00000001211dca8a(Google Chrome Framework -partition_bucket.cc:48)void partition_alloc::internal::(anonymous namespace)::PartitionOutOfMemoryMappingFailure<true>(partition_alloc::PartitionRoot<true>*, unsigned long)
0x000000011dfa61de(Google Chrome Framework -partition_bucket.cc:691)partition_alloc::internal::PartitionBucket<true>::SlowPathAlloc(partition_alloc::PartitionRoot<true>*, unsigned int, unsigned long, unsigned long, bool*)
0x000000011dfae2cc(Google Chrome Framework -partition_root.h:1072)base::AllocNonQuarantinable(unsigned long)
0x000000011df36397(Google Chrome Framework -allocation.cc:141)v8::internal::Zone::NewExpand(unsigned long)
0x0000000120a961d4(Google Chrome Framework + 0x0000000002b851d4)std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>>::vector(std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>> const&)
0x0000000122d2a138(Google Chrome Framework + 0x0000000004e19138)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::Split(v8::internal::Zone*, v8::internal::wasm::(anonymous namespace)::SsaEnv*)
0x0000000122d2bfc5(Google Chrome Framework + 0x0000000004e1afc5)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::BrOrRet(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, unsigned int, unsigned int)
0x0000000122d1efc4(Google Chrome Framework + 0x0000000004e0dfc4)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::DecodeBrTable(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, v8::internal::wasm::WasmOpcode)
0x0000000122d1b301(Google Chrome Framework + 0x0000000004e0a301)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::Decode()
0x0000000122d1ab36(Google Chrome Framework + 0x0000000004e09b36)v8::internal::wasm::BuildTFGraph(v8::internal::AccountingAllocator*, v8::internal::wasm::WasmFeatures const&, v8::internal::wasm::WasmModule const*, v8::internal::compiler::WasmGraphBuilder*, v8::internal::wasm::WasmFeatures*, v8::internal::wasm::FunctionBody const&, std::Cr::vector<v8::internal::compiler::WasmLoopInfo, std::Cr::allocator<v8::internal::compiler::WasmLoopInfo>>*, v8::internal::compiler::NodeOriginTable*, int, v8::internal::wasm::InlinedStatus)
0x0000000122ee6b35(Google Chrome Framework + 0x0000000004fd5b35)v8::internal::compiler::ExecuteTurbofanWasmCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::wasm::FunctionBody const&, int, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x000000012082f8b6(Google Chrome Framework + 0x000000000291e8b6)v8::internal::wasm::WasmCompilationUnit::ExecuteCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x00000001207fc41d(Google Chrome Framework + 0x00000000028eb41d)v8::internal::wasm::(anonymous namespace)::ExecuteCompilationUnits(std::Cr::weak_ptr<v8::internal::wasm::NativeModule>, v8::internal::Counters*, v8::JobDelegate*, v8::internal::wasm::(anonymous namespace)::CompileBaselineOnly)
0x0000000120a96633(Google Chrome Framework + 0x0000000002b85633)v8::internal::wasm::(anonymous namespace)::BackgroundCompileJob::Run(v8::JobDelegate*) (.886d8138751ea58144f90ddffe92ca79)
0x0000000125c6e110(Google Chrome Framework -v8_platform.cc:458)base::internal::Invoker<base::internal::BindState<gin::V8Platform::CreateJob(v8::TaskPriority, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>)::$_0, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>>, void (base::JobDelegate*)>::Run(base::internal::BindStateBase*, base::JobDelegate*)
0x000000012549ec24(Google Chrome Framework -callback.h:263)base::internal::Invoker<base::internal::BindState<base::internal::JobTaskSource::JobTaskSource(base::Location const&, base::TaskTraits const&, base::RepeatingCallback<void (base::JobDelegate*)>, base::RepeatingCallback<unsigned long (unsigned long)>, base::internal::PooledTaskRunnerDelegate*)::$_0, base::internal::UnretainedWrapper<base::internal::JobTaskSource>>, void ()>::Run(base::internal::BindStateBase*)
0x000000011e3e3a06(Google Chrome Framework -callback.h:145)base::internal::TaskTracker::RunSkipOnShutdown(base::internal::Task&, base::TaskTraits const&, base::internal::TaskSource*, base::SequenceToken const&)
0x000000011e58a828(Google Chrome Framework -task_tracker.cc:724)base::internal::TaskTracker::RunAndPopNextTask(base::internal::RegisteredTaskSource)
0x000000011e6e8d9d(Google Chrome Framework -worker_thread.cc:448)base::internal::WorkerThread::RunWorker()
0x00000001238739dc(Google Chrome Framework -worker_thread.cc:335)base::internal::WorkerThread::RunPooledWorker()
0x000000011f091c56(Google Chrome Framework -worker_thread.cc:315)base::internal::WorkerThread::ThreadMain()
0x000000011ee21522(Google Chrome Framework -platform_thread_posix.cc:101)base::(anonymous namespace)::ThreadFunc(void*)
0x00007ff81a49b4e0(libsystem_pthread.dylib + 0x000064e0)
0x00007ff81a496f6a(libsystem_pthread.dylib + 0x00001f6a)
I'm attaching a reproduction for posterity.
It consists of two HTML files: breaks_here.html
and works_here.html
. The first one demonstrates the problem in the worker while the second one proves the issue does not occur in the main thread.
I also played with compiling PHP into a WASM file with less nested blocks. The lex_scan
function that trips Chrome over with ~100 levels of nested block
s is a parser generated with a tool called re2c from a grammar file.
The grammar file lives in Zend/zend_language_scanner.l
and the generated parser lives in Zend/zend_language_scanner.c
. Upon investigation I found it consists of many goto
instructions with a degree of nesting, e.g.:
yy11:
YYDEBUG(11, *YYCURSOR);
YYCURSOR = YYMARKER;
goto yy7;
yy12:
YYDEBUG(12, *YYCURSOR);
yych = *++YYCURSOR;
if (yych == 'P') goto yy13;
if (yych != 'p') goto yy11;
yy13:
YYDEBUG(13, *YYCURSOR);
yych = *++YYCURSOR;
if (yych <= '\f') {
if (yych <= 0x08) goto yy11;
if (yych >= '\v') goto yy11;
} else {
if (yych <= '\r') goto yy16;
if (yych != ' ') goto yy11;
I thought there should be a way to generate a loop-based code without any gotos. The version 1.1.1
of re2c
preinstalled in this repo can't do that, but the version 3.0.0
has a --loop-switch
option that does just that. Yay!
I generated a loop-based parser, compiled PHP to wasm, and inspected the output. The good news is it didn't have the same level of nesting. The bad news is it was much worse. The loop-based parser made for ~800 levels of nested blocks instead of ~100:
I'm now starting to think there's no reducing the nesting in that .wasm
file. There's nothing wrong with block
and br_table
- they're just how you express flow control in web assembly and you need advanced flow control to parse PHP.
To summarize:
php.wasm
is likely impossibleI don't think there's anything else we can do here other than wait for a bugfix from the Chrome team. Until that happens, this project will use iframes instead of web workers to offload WASM processing.
Oh and one last note – it also crashes in a SharedWorker so we can't use that as a fallback.
I just got a report that the crash is still happening even in an iframe. Snap!
Edit: It must have been a problem with the Origin-Agent-Cluster: ?1
header that tells the browser to spin another process. The crash doesn't happen with that header in place.
Google Chrome team have been very responsive and released a series of patches since I first reported this problem (private issue 1372621 for posterity). In my latest round of testing I haven't been able to crash the browser. Woohooo!
I'm closing this issue for now. Let's reopen if the problem ever returns.
What is this issue about?
The WASM PHP crashes in chrome. It does not crash in Firefox, Safari, and node.js.
See the minimal reproduction in bug-reproduction.zip. It consists of two HTML files: breaks_here.html and works_here.html. The first one demonstrates the problem in the worker and the second one
shows that the issue does not occur in the main threadcrashes too, although less frequently.The issue is the most apparent inside of a webworker, but it also exists when WASM is initialized in the main browser thread. The code below is enough to trigger the crash. Note we don't even run any wasm code, just instantiate the module:
Chromium debugging findings
The Chromium team shared the following stack trace proving this is an out of memory problem:
Other Chromium findings
I did some debugging before they shared that stack trace. The list below is less relevant than the specific details in the stack trace above, but I'm still posting it here for posterity:
TERMINATION_STATUS_PROCESS_CRASHED
.valgrind
would help, but it won't run on Mac.about:crash
and other crashes.This means you need to manually symbolize it to extract any useful information and I didn't get there yet.Older Chromium used breakpad where manual symbolization was needed. Modern Chromium uses crashpad which can be symbolized as follows:Chromium debugging resources
I built Chromium on Mac like this:
Then, I created a new empty xcode project and used the
Debug > Attach to > Chromium
from the top level menu. Finally, I paused the process and set a breakpoint on the error page handler like this:It didn't yield much information so I looked for scraps of information and set further breakpoints:
GetTerminationStatus
– breakpoint wasn't triggeredV8_Fatal
– breakpoint wasn't triggeredPerformShutdownOnWorkerThread
– breakpoint wasn't triggeredSee more information at: