WordPress / wordpress-playground

Run WordPress in the browser via WebAssembly PHP
https://w.org/playground/
GNU General Public License v2.0
1.64k stars 255 forks source link

WASM file crashes Google Chrome #1

Closed adamziel closed 1 year ago

adamziel commented 2 years ago

What is this issue about?

The WASM PHP crashes in chrome. It does not crash in Firefox, Safari, and node.js.

See the minimal reproduction in bug-reproduction.zip. It consists of two HTML files: breaks_here.html and works_here.html. The first one demonstrates the problem in the worker and the second one shows that the issue does not occur in the main thread crashes too, although less frequently.

The issue is the most apparent inside of a webworker, but it also exists when WASM is initialized in the main browser thread. The code below is enough to trigger the crash. Note we don't even run any wasm code, just instantiate the module:

(() => {
  // src/web/web-worker.js
  console.log("[WebWorker] Spawned");
  var wasmTable = new WebAssembly.Table({
    initial: 1090,
    maximum: 1090,
    element: "anyfunc"
  });
  var WASM_PAGE_SIZE = 65536;
  var INITIAL_INITIAL_MEMORY = 1073741824;
  var wasmMemory = new WebAssembly.Memory({
    initial: INITIAL_INITIAL_MEMORY / WASM_PAGE_SIZE
  });
  var info = {
    env: {
      _zend_empty_array2: 1,
      tempDoublePtr: 2303696,
      "__memory_base": 1024,
      __table_base: 0,
      memory: wasmMemory,
      table: wasmTable
    },
    global: { NaN: NaN, Infinity: Infinity },
    asm2wasm: {
      "f64-rem"() {
      }
    }
  };
  fetch("updated.wasm").then(async (response) => {
    WebAssembly.instantiate(
      await response.arrayBuffer(),
      info
    ).then(() => {
      console.log("Instantiated!");
    });
    console.log("Called instantiate");
  });
  console.log("Called fetch", { info });
})();

Chromium debugging findings

The Chromium team shared the following stack trace proving this is an out of memory problem:

Magic Signature >> [Out of Memory] v8::internal::Zone::NewExpand

Stack Trace >>
Thread 26 ThreadPoolForegroundWorker (id: 0x005aad74)crashedMAGIC SIGNATURE THREADcontent_copy
0x00000001211dbf58(Google Chrome Framework -oom.cc:58)partition_alloc::internal::OnNoMemoryInternal(unsigned long)
0x00000001211dbf68(Google Chrome Framework -oom.cc:65)partition_alloc::TerminateBecauseOutOfMemory(unsigned long)
0x00000001211dbf85(Google Chrome Framework -oom.cc:75)partition_alloc::internal::OnNoMemory(unsigned long)
0x00000001246e95b2(Google Chrome Framework -partitions.cc:323)WTF::PartitionsOutOfMemoryUsing512M(unsigned long)
0x00000001246e948c(Google Chrome Framework -partitions.cc:448)WTF::Partitions::HandleOutOfMemory(unsigned long)
0x00000001211dd8b3(Google Chrome Framework -partition_root.cc:619)partition_alloc::PartitionRoot<true>::OutOfMemory(unsigned long)
0x00000001211dca8a(Google Chrome Framework -partition_bucket.cc:48)void partition_alloc::internal::(anonymous namespace)::PartitionOutOfMemoryMappingFailure<true>(partition_alloc::PartitionRoot<true>*, unsigned long)
0x000000011dfa61de(Google Chrome Framework -partition_bucket.cc:691)partition_alloc::internal::PartitionBucket<true>::SlowPathAlloc(partition_alloc::PartitionRoot<true>*, unsigned int, unsigned long, unsigned long, bool*)
0x000000011dfae2cc(Google Chrome Framework -partition_root.h:1072)base::AllocNonQuarantinable(unsigned long)
0x000000011df36397(Google Chrome Framework -allocation.cc:141)v8::internal::Zone::NewExpand(unsigned long)
0x0000000120a961d4(Google Chrome Framework + 0x0000000002b851d4)std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>>::vector(std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>> const&)
0x0000000122d2a138(Google Chrome Framework + 0x0000000004e19138)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::Split(v8::internal::Zone*, v8::internal::wasm::(anonymous namespace)::SsaEnv*)
0x0000000122d2bfc5(Google Chrome Framework + 0x0000000004e1afc5)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::BrOrRet(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, unsigned int, unsigned int)
0x0000000122d1efc4(Google Chrome Framework + 0x0000000004e0dfc4)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::DecodeBrTable(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, v8::internal::wasm::WasmOpcode)
0x0000000122d1b301(Google Chrome Framework + 0x0000000004e0a301)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::Decode()
0x0000000122d1ab36(Google Chrome Framework + 0x0000000004e09b36)v8::internal::wasm::BuildTFGraph(v8::internal::AccountingAllocator*, v8::internal::wasm::WasmFeatures const&, v8::internal::wasm::WasmModule const*, v8::internal::compiler::WasmGraphBuilder*, v8::internal::wasm::WasmFeatures*, v8::internal::wasm::FunctionBody const&, std::Cr::vector<v8::internal::compiler::WasmLoopInfo, std::Cr::allocator<v8::internal::compiler::WasmLoopInfo>>*, v8::internal::compiler::NodeOriginTable*, int, v8::internal::wasm::InlinedStatus)
0x0000000122ee6b35(Google Chrome Framework + 0x0000000004fd5b35)v8::internal::compiler::ExecuteTurbofanWasmCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::wasm::FunctionBody const&, int, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x000000012082f8b6(Google Chrome Framework + 0x000000000291e8b6)v8::internal::wasm::WasmCompilationUnit::ExecuteCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x00000001207fc41d(Google Chrome Framework + 0x00000000028eb41d)v8::internal::wasm::(anonymous namespace)::ExecuteCompilationUnits(std::Cr::weak_ptr<v8::internal::wasm::NativeModule>, v8::internal::Counters*, v8::JobDelegate*, v8::internal::wasm::(anonymous namespace)::CompileBaselineOnly)
0x0000000120a96633(Google Chrome Framework + 0x0000000002b85633)v8::internal::wasm::(anonymous namespace)::BackgroundCompileJob::Run(v8::JobDelegate*) (.886d8138751ea58144f90ddffe92ca79)
0x0000000125c6e110(Google Chrome Framework -v8_platform.cc:458)base::internal::Invoker<base::internal::BindState<gin::V8Platform::CreateJob(v8::TaskPriority, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>)::$_0, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>>, void (base::JobDelegate*)>::Run(base::internal::BindStateBase*, base::JobDelegate*)
0x000000012549ec24(Google Chrome Framework -callback.h:263)base::internal::Invoker<base::internal::BindState<base::internal::JobTaskSource::JobTaskSource(base::Location const&, base::TaskTraits const&, base::RepeatingCallback<void (base::JobDelegate*)>, base::RepeatingCallback<unsigned long (unsigned long)>, base::internal::PooledTaskRunnerDelegate*)::$_0, base::internal::UnretainedWrapper<base::internal::JobTaskSource>>, void ()>::Run(base::internal::BindStateBase*)
0x000000011e3e3a06(Google Chrome Framework -callback.h:145)base::internal::TaskTracker::RunSkipOnShutdown(base::internal::Task&, base::TaskTraits const&, base::internal::TaskSource*, base::SequenceToken const&)
0x000000011e58a828(Google Chrome Framework -task_tracker.cc:724)base::internal::TaskTracker::RunAndPopNextTask(base::internal::RegisteredTaskSource)
0x000000011e6e8d9d(Google Chrome Framework -worker_thread.cc:448)base::internal::WorkerThread::RunWorker()
0x00000001238739dc(Google Chrome Framework -worker_thread.cc:335)base::internal::WorkerThread::RunPooledWorker()
0x000000011f091c56(Google Chrome Framework -worker_thread.cc:315)base::internal::WorkerThread::ThreadMain()
0x000000011ee21522(Google Chrome Framework -platform_thread_posix.cc:101)base::(anonymous namespace)::ThreadFunc(void*)
0x00007ff81a49b4e0(libsystem_pthread.dylib + 0x000064e0)
0x00007ff81a496f6a(libsystem_pthread.dylib + 0x00001f6a)

Other Chromium findings

I did some debugging before they shared that stack trace. The list below is less relevant than the specific details in the stack trace above, but I'm still posting it here for posterity:

CleanShot 2022-09-29 at 21 01 16@2x
#
# Fatal error in ../../v8/src/debug/debug-interface.cc, line 352
# Debug check failed: !isolate->is_execution_terminating().
#
#
#
#FailureMessage Object: 0x700009b7bc60[33312:259:0929/223909.968925:VERBOSE1:node.cc(1175)] OnUpdatePreviousPeer port: E64B3F19A9C61113.8955D49085231446 changing to AA7F7063054BDC96.DF5C9FB08811A500, port: E2387E855C7DDE33.6435B60F182B2A50 => BFD53B2279161D17.D245A7E2302529BE
[33609:18179:0929/223909.969410:VERBOSE1:node.cc(1175)] OnUpdatePreviousPeer port: BFD53B2279161D17.D245A7E2302529BE changing to 6975933DD5F27952.C20103EDCDFD9098, port: E2387E855C7DDE33.6435B60F182B2A50 => E64B3F19A9C61113.8955D49085231446
[33614:259:0929/223909.971940:VERBOSE1:paint_controller.cc(709)] PaintController::FinishCycle() completed
0   libbase.dylib                       0x000000010d65f21c base::debug::CollectStackTrace(void**, unsigned long) + 44
1   libbase.dylib                       0x000000010d2f6978 base::debug::StackTrace::StackTrace(unsigned long) + 72
2   libbase.dylib                       0x000000010d2f69fd base::debug::StackTrace::StackTrace(unsigned long) + 29
3   libbase.dylib                       0x000000010d2f69d5 base::debug::StackTrace::StackTrace() + 37
4   libgin.dylib                        0x00000001a7b79d1b gin::(anonymous namespace)::PrintStackTrace() + 59
5   libv8_libbase.dylib                 0x0000000119cd80f1 V8_Fatal(char const*, int, char const*, ...) + 337
6   libv8_libbase.dylib                 0x0000000119cd78e5 v8::base::(anonymous namespace)::DefaultDcheckHandler(char const*, int, char const*) + 21
7   libv8.dylib                         0x00000001d4423933 v8::debug::SetBreakPointsActive(v8::Isolate*, bool) + 291
8   libv8.dylib                         0x00000001d5558df3 v8_inspector::V8DebuggerAgentImpl::disable() + 419
9   libv8.dylib                         0x00000001d5587bca v8_inspector::V8InspectorSessionImpl::~V8InspectorSessionImpl() + 346
10  libv8.dylib                         0x00000001d5587d9e v8_inspector::V8InspectorSessionImpl::~V8InspectorSessionImpl() + 14
11  libblink_core.dylib                 0x00000001df137d4c std::Cr::default_delete<v8_inspector::V8InspectorSession>::operator()[abi:v16000](v8_inspector::V8InspectorSession*) const + 44
12  libblink_core.dylib                 0x00000001df12792a std::Cr::unique_ptr<v8_inspector::V8InspectorSession, std::Cr::default_delete<v8_inspector::V8InspectorSession>>::reset[abi:v16000](v8_inspector::V8InspectorSession*) + 106
13  libblink_core.dylib                 0x00000001df126c08 blink::DevToolsSession::Detach() + 1288
[33312:259:0929/223909.985578:VERBOSE1:node.cc(1175)] OnUpdatePreviousPeer port: 49CA70FD9D9D7B94.3536F4AFEA56CA2C changing to AA7F7063054BDC96.DF5C9FB08811A500, port: 64987F6ACCED8335.8529510F6C84227 => 12FA36F038112775.2F840886748B53FC
14  libblink_core.dylib                 0x00000001df10b64f blink::DevToolsAgent::Dispose() + 527
15  libblink_core.dylib                 0x00000001df508b77 blink::WorkerInspectorController::Dispose() + 183
16  libblink_core.dylib                 0x00000001e08501ce blink::WorkerThread::PerformShutdownOnWorkerThread() + 526
17  libblink_core.dylib                 0x00000001e085726a void base::internal::FunctorTraits<void (blink::WorkerThread::*)(), void>::Invoke<void (blink::WorkerThread::*)(), blink::WorkerThread*>(void (blink::WorkerThread::*)(), blink::WorkerThread*&&) + 122
18  libblink_core.dylib                 0x00000001e08571e4 void base::internal::InvokeHelper<false, void>::MakeItSo<void (blink::WorkerThread::*)(), blink::WorkerThread*>(void (blink::WorkerThread::*&&)(), blink::WorkerThread*&&) + 52
19  libblink_core.dylib                 0x00000001e0857188 void base::internal::Invoker<base::internal::BindState<void (blink::WorkerThread::*)(), WTF::CrossThreadUnretainedWrapper<blink::WorkerThread>>, void ()>::RunImpl<void (blink::WorkerThread::*)(), std::Cr::tuple<WTF::CrossThreadUnretainedWrapper<blink::WorkerThread>>, 0ul>(void (blink::WorkerThread::*&&)(), std::Cr::tuple<WTF::CrossThreadUnretainedWrapper<blink::WorkerThread>>&&, std::Cr::integer_sequence<unsigned long, 0ul>) + 72
20  libblink_core.dylib                 0x00000001e08570e7 base::internal::Invoker<base::internal::BindState<void (blink::WorkerThread::*)(), WTF::CrossThreadUnretainedWrapper<blink::WorkerThread>>, void ()>::RunOnce(base::internal::BindStateBase*) + 55
21  libbase.dylib                       0x000000010d2a2887 base::OnceCallback<void ()>::Run() && + 103
22  libbase.dylib                       0x000000010d504492 base::TaskAnnotator::RunTaskImpl(base::PendingTask&) + 418
23  libbase.dylib                       0x000000010d56feae _ZN4base13TaskAnnotator7RunTaskIJZNS_16sequence_manager8internal35ThreadControllerWithMessagePumpImpl10DoWorkImplEPNS_7LazyNowEE3$_0EEEvN8perfetto12StaticStringERNS_11PendingTaskEDpOT_ + 126
24  libbase.dylib                       0x000000010d56f9fa base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl(base::LazyNow*) + 2362
25  libbase.dylib                       0x000000010d56ebb6 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() + 246
26  libbase.dylib                       0x000000010d56fd13 non-virtual thunk to base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() + 35
27  libbase.dylib                       0x000000010d389497 base::MessagePumpDefault::Run(base::MessagePump::Delegate*) + 151
28  libbase.dylib                       0x000000010d570661 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run(bool, base::TimeDelta) + 705
29  libbase.dylib                       0x000000010d46b483 base::RunLoop::Run(base::Location const&) + 755
30  libblink_platform.dylib             0x00000001f0ab9b88 blink::scheduler::NonMainThreadImpl::SimpleThreadImpl::Run() + 568
31  libbase.dylib                       0x000000010d5fcf0a base::SimpleThread::ThreadMain() + 74
32  libbase.dylib                       0x000000010d68b2f2 base::(anonymous namespace)::ThreadFunc(void*) + 226
33  libsystem_pthread.dylib             0x00007ff81a49b4e1 _pthread_start + 125
34  libsystem_pthread.dylib             0x00007ff81a496f6b thread_start + 15

Chromium debugging resources

I built Chromium on Mac like this:

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
export PATH="$PATH:"`pwd`
cd ~/ && mkdir chromium && cd chromium
caffeinate fetch --no-history chromium
caffeinate autoninja -C out/Default chrome
./out/Default/Chromium.app/Contents/MacOS/Chromium --enable-logging --v=1

Then, I created a new empty xcode project and used the Debug > Attach to > Chromium from the top level menu. Finally, I paused the process and set a breakpoint on the error page handler like this:

(lldb) b SadTab

It didn't yield much information so I looked for scraps of information and set further breakpoints:

See more information at:

adamziel commented 2 years ago

@gziolo captured this issue on the video:

https://user-images.githubusercontent.com/205419/191630406-8e30f3f0-b92d-454e-8ae5-338a737645b2.mov

adamziel commented 2 years ago

I upgraded PHP to 8.0.24 in https://github.com/adamziel/wordpress-wasm/commit/45ba8cefc1c72422cfe69e07c68f8656b2d60da0 and https://github.com/adamziel/wordpress-wasm/commit/9d04a113a97f53a1303fefba642f65f1b5a5a5f6 in hopes it would solve this. No crashes so far, but it didn't have too much testing yet.

adamziel commented 2 years ago

Here's Birgit's videos capturing the crash:

https://recordit.co/0gMsH3GDn1 https://recordit.co/HOc1D5IfuD

CleanShot 2022-09-23 at 10 58 26@2x
adamziel commented 2 years ago

I just reproduced the crash :( It says error code 5 which seems to be a generic Chrome runtime error:

CleanShot 2022-09-23 at 11 01 07@2x
adamziel commented 2 years ago

Interestingly, I can't get it to crash in Firefox or Safari. Perhaps we're hitting a Chrome bug?

adamziel commented 2 years ago

Here's an idea: compile Chromium with debug=true, attach a debugger, and inspect the crash:

Conveniently, on Ubuntu the debug symbols are shipped as a part of the chromium-browser package. It doesn't seem to be the case for the Mac package, though.

Even better – instead of attaching the debugger, inspect the minidump file generated by chromium on crash:

adamziel commented 2 years ago

I debugged Chromium today and added my findings to this issue's description: https://github.com/WordPress/wordpress-wasm/issues/1#issue-1354256184

CC @jsnajdr and @dmsnell – you might enjoy exploring this challenge with me. Also cc @swissspidy in case you know anyone in Chromium team who might be willing to take a look at this.

adamziel commented 2 years ago

Chrome debugging aside, the php-wasm playground seems to just work and never break – I wonder why is that. One difference I see is they don't use web workers and just load everything in the main thread.

adamziel commented 2 years ago

I think I'm getting somewhere – when PHP is running in the main thread and not in a webworker, it never seems to crash. I wonder if this is related to wasm at all, or is it just some inter-process issue with web workers.

dmsnell commented 2 years ago

@adamziel I have no idea, but if it's related to a WebWorker then we might start looking for memory transfers that could involve non-transferrable objects. are we pinned to running it in workers? or is it easy enough to have it built as a single-threaded beast?

adamziel commented 2 years ago

are we pinned to running it in workers? or is it easy enough to have it built as a single-threaded beast?

@dmsnell I started with a single-threaded setup, but it was super slow: every request took forever to load AND your interactions with the page were blocked while it was being loaded. My hypothesis is that the speed penalty was due to forcing Chrome to switch contexts between rendering and handling WASM.

I'm exploring a minimal reproducible crash scenario. Turns out, a web worker like this is all it takes:

console.log( '[WebWorker] Spawned' );
importScripts( '/webworker-php.js' ); // Generated by emscripten

new PHP( {} ) // PHP is the generated module
    .then( () => {
        console.log( '[WebWorker] PHP initialized' );
    } );

Note I'm not transferring anything between threads explicitly, although something could be happening implicitly.

It could be something super specific deep in the emscripten's loading setup. I'll give it an hour more or so, but if I can't pinpoint the issue then I'll explore emulating a webworker with an iframe.

adamziel commented 2 years ago

The crash happens when instantiating WebAssembly in a web worker.

Booting WordPress WASM goes through the following emscripten-generated code path:

fetch( wasmBinaryFile, { credentials: 'same-origin' } ).then( function(
    response,
) {
    const result = WebAssembly.instantiateStreaming( response, info );
    return result.then( receiveInstantiatedSource, function( reason ) {
        err( 'wasm streaming compile failed: ' + reason );
        err( 'falling back to ArrayBuffer instantiation' );
        return instantiateArrayBuffer( receiveInstantiatedSource );
    } );
} );

The crash still occurs after commenting receiveInstantiatedSource out. The culprit is the WebAssembly.instantiateStreaming():

// Crashes
WebAssembly.instantiateStreaming(
    await fetch( wasmBinaryFile ),
    info,
);

// Regular instantiate() crashes as well
fetch( wasmBinaryFile ).then( async ( response ) => {
    WebAssembly.instantiate(
        await response.arrayBuffer(),
        info,
    );
} );

Here's an exploratory branch where the crash is being boiled down to its essence. Feel free to clone and help with this one.

So how to fix it? Here's a few ideas:

Accordingly to this StackOverflow answer, most major browsers seem to run iframes in a separate thread as long as it comes from a different domain. That's not a part of any spec, though. Just an implementation detail that may change at any time.

If the best solution still incurs a speed penalty it could become a chrome-only fallback.

adamziel commented 2 years ago

It must be specific to the php wasm build. I just built a minimal .wasm from the following c file using all the same emscripten options as for PHP, and it just won't crash:

#include <emscripten.h>
#include <stdlib.h>

int main() { return 0; }

int EMSCRIPTEN_KEEPALIVE test()
{
    return 10;
}

A few ideas where to go from here:

It would also be great to prepare an isolated branch with a minimal reproduction example and loop in the chrome team.

Edit: I compiled just libphp without pib_eval and the crash is still happening.

Here's what else I just tried without success:

    --enable-embed=static \
    --with-layout=GNU  \
    --disable-cgi      \
    --disable-cli      \
    --disable-all      \
    --without-sqlite3     \
    --disable-session   \
    --disable-filter    \
    --disable-calendar  \
    --disable-dom       \
    --disable-pdo       \
    --without-pdo-sqlite  \
    --disable-rpath    \
    --disable-phpdbg   \
    --without-pear     \
    --with-valgrind=no \
    --without-pcre-jit \
    --disable-bcmath    \
    --disable-json      \
    --disable-ctype     \
    --disable-mbstring  \
    --disable-mbregex  \
    --disable-tokenizer \
    --disable-xml       \
    --disable-simplexml \
    --without-gd
adamziel commented 2 years ago

A combination of the following configure and emcc yielded a crashless wasm binary:


RUN cd php-src/ && PKG_CONFIG_PATH=$PKG_CONFIG_PATH emconfigure ./configure \
    PKG_CONFIG_PATH=$PKG_CONFIG_PATH \
    --enable-embed=static \
    --with-layout=GNU  \
    --disable-all      \
    --without-sqlite3     \
    --without-zlib     \
    --disable-session   \
    --disable-filter    \
    --disable-calendar  \
    --disable-dom       \
    --disable-pdo       \
    --without-pdo-sqlite  \
    --without-tsrm-pthreads  \
    --disable-rpath    \
    --disable-phpdbg   \
    --without-pear     \
    --without-pcre-jit \
    --disable-bcmath    \
    --disable-shared    \
    --disable-libgcc    \
    --disable-rpath    \
    --disable-static    \
    --without-gnu-ld    \
    --disable-cli    \
    --disable-cgi    \
    --disable-phpdbg    \
    --without-servlet    \
    --disable-json      \
    --disable-ctype     \
    --disable-mbstring  \
    --disable-mbregex  \
    --disable-tokenizer \
    --disable-xml       \
    --disable-simplexml \
    --without-gd

docker run \
        -v `pwd`/preload:/preload \
        -v `pwd`/docker-output:/output \
        wasm-wordpress-php-builder:latest \
        emcc \
        -o /output/webworker-php.js \
        -s EXPORTED_FUNCTIONS='["_zend_eval_string"]' \
        -s MAXIMUM_MEMORY=-1             \
        -s INITIAL_MEMORY=1024MB \
        -s ALLOW_MEMORY_GROWTH=1         \
        -s ASSERTIONS=1                  \
        -s ERROR_ON_UNDEFINED_SYMBOLS=0  \
        -s EXPORT_NAME="'PHP'"           \
        -s MODULARIZE=1                  \
        -s INVOKE_RUN=0                  \
                /root/lib/libphp7.a \
        -s ENVIRONMENT=worker

Importantly, it's only 711KB instead of 10MB – most of the symbols must have been optimized away. That's good progress, though, the build process with and without a crash is clear – it should be possible to zero-in on the cause.

dmsnell commented 2 years ago

produce a stable wasm binary, e.g. -O0 to disable optimizations

I don't think optimization is going to impact the stability of the build. in fact, if you try -Os it could have a positive impact if you suspect memory limits are in play.

there's another --enable-debug flag you can use when configuring PHP that could be worth a try, though I doubt it will produce any more meaningful errors or output when this crashes.

what about trying to run this in node with node:worker_threads? maybe if we can get it to crash there it would send a better error?

adamziel commented 2 years ago

The crash is caused by the wasm-ized PHP function called $_lex_scan!

I played with the PHP C code for a day, but couldn't identify the root cause of the crash. I did however reduce the crashing binary size from 10 MB to 600 KB. From there, I was able to convert it to a WAT text format using wasm2wat and remove parts of the assembly using an ad-hoc python script. I am currently ripping out instructions from a "minimal" 122KB WASM file to see at which point it will stop crashing.

what about trying to run this in node with node:worker_threads? maybe if we can get it to crash there it would send a better error?

@dmsnell unfortunately I couldn't cause the crash with node:worker_threads – it seems to be specific to Chrome :-(

adamziel commented 2 years ago

Oh I forgot to mention – that "minimal" 122KB file is literally just the $_lex_scan function and some globals – I removed everything else from it. I will upload it here later.

dmsnell commented 2 years ago

this is good work; I hope it leads to fixes if there's a problem with the WASM runtime

adamziel commented 2 years ago

Here's the .wat file I promised earlier:

updated.wat.zip

Everything you have to do to cause the crash is compile it towasm via wat2wasm and run the snippet below in a Webworker:

(() => {
  // src/web/web-worker.js
  console.log("[WebWorker] Spawned");
  var wasmTable = new WebAssembly.Table({
    initial: 1090,
    maximum: 1090,
    element: "anyfunc"
  });
  var WASM_PAGE_SIZE = 65536;
  var INITIAL_INITIAL_MEMORY = 1073741824;
  var wasmMemory = new WebAssembly.Memory({
    initial: INITIAL_INITIAL_MEMORY / WASM_PAGE_SIZE
  });
  var info = {
    env: {
      _zend_empty_array2: 1,
      tempDoublePtr: 2303696,
      "__memory_base": 1024,
      __table_base: 0,
      memory: wasmMemory,
      table: wasmTable
    },
    global: { NaN: NaN, Infinity: Infinity },
    asm2wasm: {
      "f64-rem"() {
      }
    }
  };
  fetch("updated.wasm").then(async (response) => {
    WebAssembly.instantiate(
      await response.arrayBuffer(),
      info
    ).then(() => {
      console.log("Instantiated!");
    });
    console.log("Called instantiate");
  });
  console.log("Called fetch", { info });
})();

In the meantime, I keep reducing it further to isolate the specific part causing the problem.

adamziel commented 2 years ago

I filed a Chromium issue (it seems to be private). Minidump/crash ID: dfe74270-7a4e-4145-9859-e77979a8145d

adamziel commented 2 years ago

I got it down to 2.6 M, see minimal_repro.zip. To try, host it locally e.g. via php -S localhost:8080 and navigate to index.html with devtools open and refresh a few dozen times.

Reducing the .wat file won't move this issue any further. The more lines I remove, the less likely Chrome is to crash and the 12mb bundle crashes much more often than the small one. I hoped this crash would be caused by some specific code block, but now I'm thinking this is a complex systemic that can be solved by simplifying any part of a complex system.

How complex? Well, this is how that function starts:

CleanShot 2022-10-07 at 17 32 35@2x
adamziel commented 2 years ago

Here's the current situation:

I don't know how else to zero-in on the root cause of the crash, so I'll try to work around it without understanding it well. I can only thing of two ways:

adamziel commented 2 years ago

Running WebAseembly inside of an iframe sourced from another domain is decently fast and does not crash Chrome!

The downside is a multi-domain setup which takes much more work than just adding a <script src=""> to head.

I will update trunk to use iframes instead of webworkers and leave solving the crash to the Chrome team. See my previous comment for the details of the related bugs.chromium.org issue.

adamziel commented 2 years ago

28 works around the issue by handling the WASM in an iframe coming from a different domain name.

adamziel commented 2 years ago

For posterity, this seems to be an Out of Memory problem in v8 – it's weird it never occurs in node.js. The Chromium team shared the following stack trace generated on crash:

Magic Signature >> [Out of Memory] v8::internal::Zone::NewExpand

Stack Trace >>
Thread 26 ThreadPoolForegroundWorker (id: 0x005aad74)crashedMAGIC SIGNATURE THREADcontent_copy
0x00000001211dbf58(Google Chrome Framework -oom.cc:58)partition_alloc::internal::OnNoMemoryInternal(unsigned long)
0x00000001211dbf68(Google Chrome Framework -oom.cc:65)partition_alloc::TerminateBecauseOutOfMemory(unsigned long)
0x00000001211dbf85(Google Chrome Framework -oom.cc:75)partition_alloc::internal::OnNoMemory(unsigned long)
0x00000001246e95b2(Google Chrome Framework -partitions.cc:323)WTF::PartitionsOutOfMemoryUsing512M(unsigned long)
0x00000001246e948c(Google Chrome Framework -partitions.cc:448)WTF::Partitions::HandleOutOfMemory(unsigned long)
0x00000001211dd8b3(Google Chrome Framework -partition_root.cc:619)partition_alloc::PartitionRoot<true>::OutOfMemory(unsigned long)
0x00000001211dca8a(Google Chrome Framework -partition_bucket.cc:48)void partition_alloc::internal::(anonymous namespace)::PartitionOutOfMemoryMappingFailure<true>(partition_alloc::PartitionRoot<true>*, unsigned long)
0x000000011dfa61de(Google Chrome Framework -partition_bucket.cc:691)partition_alloc::internal::PartitionBucket<true>::SlowPathAlloc(partition_alloc::PartitionRoot<true>*, unsigned int, unsigned long, unsigned long, bool*)
0x000000011dfae2cc(Google Chrome Framework -partition_root.h:1072)base::AllocNonQuarantinable(unsigned long)
0x000000011df36397(Google Chrome Framework -allocation.cc:141)v8::internal::Zone::NewExpand(unsigned long)
0x0000000120a961d4(Google Chrome Framework + 0x0000000002b851d4)std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>>::vector(std::Cr::vector<v8::internal::compiler::Node*, v8::internal::ZoneAllocator<v8::internal::compiler::Node*>> const&)
0x0000000122d2a138(Google Chrome Framework + 0x0000000004e19138)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::Split(v8::internal::Zone*, v8::internal::wasm::(anonymous namespace)::SsaEnv*)
0x0000000122d2bfc5(Google Chrome Framework + 0x0000000004e1afc5)v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface::BrOrRet(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, unsigned int, unsigned int)
0x0000000122d1efc4(Google Chrome Framework + 0x0000000004e0dfc4)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::DecodeBrTable(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>*, v8::internal::wasm::WasmOpcode)
0x0000000122d1b301(Google Chrome Framework + 0x0000000004e0a301)v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)2, v8::internal::wasm::(anonymous namespace)::WasmGraphBuildingInterface, (v8::internal::wasm::DecodingMode)0>::Decode()
0x0000000122d1ab36(Google Chrome Framework + 0x0000000004e09b36)v8::internal::wasm::BuildTFGraph(v8::internal::AccountingAllocator*, v8::internal::wasm::WasmFeatures const&, v8::internal::wasm::WasmModule const*, v8::internal::compiler::WasmGraphBuilder*, v8::internal::wasm::WasmFeatures*, v8::internal::wasm::FunctionBody const&, std::Cr::vector<v8::internal::compiler::WasmLoopInfo, std::Cr::allocator<v8::internal::compiler::WasmLoopInfo>>*, v8::internal::compiler::NodeOriginTable*, int, v8::internal::wasm::InlinedStatus)
0x0000000122ee6b35(Google Chrome Framework + 0x0000000004fd5b35)v8::internal::compiler::ExecuteTurbofanWasmCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::wasm::FunctionBody const&, int, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x000000012082f8b6(Google Chrome Framework + 0x000000000291e8b6)v8::internal::wasm::WasmCompilationUnit::ExecuteCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::Counters*, v8::internal::wasm::AssemblerBufferCache*, v8::internal::wasm::WasmFeatures*)
0x00000001207fc41d(Google Chrome Framework + 0x00000000028eb41d)v8::internal::wasm::(anonymous namespace)::ExecuteCompilationUnits(std::Cr::weak_ptr<v8::internal::wasm::NativeModule>, v8::internal::Counters*, v8::JobDelegate*, v8::internal::wasm::(anonymous namespace)::CompileBaselineOnly)
0x0000000120a96633(Google Chrome Framework + 0x0000000002b85633)v8::internal::wasm::(anonymous namespace)::BackgroundCompileJob::Run(v8::JobDelegate*) (.886d8138751ea58144f90ddffe92ca79)
0x0000000125c6e110(Google Chrome Framework -v8_platform.cc:458)base::internal::Invoker<base::internal::BindState<gin::V8Platform::CreateJob(v8::TaskPriority, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>)::$_0, std::Cr::unique_ptr<v8::JobTask, std::Cr::default_delete<v8::JobTask>>>, void (base::JobDelegate*)>::Run(base::internal::BindStateBase*, base::JobDelegate*)
0x000000012549ec24(Google Chrome Framework -callback.h:263)base::internal::Invoker<base::internal::BindState<base::internal::JobTaskSource::JobTaskSource(base::Location const&, base::TaskTraits const&, base::RepeatingCallback<void (base::JobDelegate*)>, base::RepeatingCallback<unsigned long (unsigned long)>, base::internal::PooledTaskRunnerDelegate*)::$_0, base::internal::UnretainedWrapper<base::internal::JobTaskSource>>, void ()>::Run(base::internal::BindStateBase*)
0x000000011e3e3a06(Google Chrome Framework -callback.h:145)base::internal::TaskTracker::RunSkipOnShutdown(base::internal::Task&, base::TaskTraits const&, base::internal::TaskSource*, base::SequenceToken const&)
0x000000011e58a828(Google Chrome Framework -task_tracker.cc:724)base::internal::TaskTracker::RunAndPopNextTask(base::internal::RegisteredTaskSource)
0x000000011e6e8d9d(Google Chrome Framework -worker_thread.cc:448)base::internal::WorkerThread::RunWorker()
0x00000001238739dc(Google Chrome Framework -worker_thread.cc:335)base::internal::WorkerThread::RunPooledWorker()
0x000000011f091c56(Google Chrome Framework -worker_thread.cc:315)base::internal::WorkerThread::ThreadMain()
0x000000011ee21522(Google Chrome Framework -platform_thread_posix.cc:101)base::(anonymous namespace)::ThreadFunc(void*)
0x00007ff81a49b4e0(libsystem_pthread.dylib + 0x000064e0)
0x00007ff81a496f6a(libsystem_pthread.dylib + 0x00001f6a)
adamziel commented 2 years ago

I'm attaching a reproduction for posterity.

It consists of two HTML files: breaks_here.html and works_here.html. The first one demonstrates the problem in the worker while the second one proves the issue does not occur in the main thread.

bug-reproduction.zip

adamziel commented 2 years ago

I also played with compiling PHP into a WASM file with less nested blocks. The lex_scan function that trips Chrome over with ~100 levels of nested blocks is a parser generated with a tool called re2c from a grammar file.

The grammar file lives in Zend/zend_language_scanner.l and the generated parser lives in Zend/zend_language_scanner.c. Upon investigation I found it consists of many goto instructions with a degree of nesting, e.g.:

yy11:
    YYDEBUG(11, *YYCURSOR);
    YYCURSOR = YYMARKER;
    goto yy7;
yy12:
    YYDEBUG(12, *YYCURSOR);
    yych = *++YYCURSOR;
    if (yych == 'P') goto yy13;
    if (yych != 'p') goto yy11;
yy13:
    YYDEBUG(13, *YYCURSOR);
    yych = *++YYCURSOR;
    if (yych <= '\f') {
        if (yych <= 0x08) goto yy11;
        if (yych >= '\v') goto yy11;
    } else {
        if (yych <= '\r') goto yy16;
        if (yych != ' ') goto yy11;

I thought there should be a way to generate a loop-based code without any gotos. The version 1.1.1 of re2c preinstalled in this repo can't do that, but the version 3.0.0 has a --loop-switch option that does just that. Yay!

I generated a loop-based parser, compiled PHP to wasm, and inspected the output. The good news is it didn't have the same level of nesting. The bad news is it was much worse. The loop-based parser made for ~800 levels of nested blocks instead of ~100:

CleanShot 2022-10-11 at 19 04 21@2x

I'm now starting to think there's no reducing the nesting in that .wasm file. There's nothing wrong with block and br_table - they're just how you express flow control in web assembly and you need advanced flow control to parse PHP.

To summarize:

I don't think there's anything else we can do here other than wait for a bugfix from the Chrome team. Until that happens, this project will use iframes instead of web workers to offload WASM processing.

adamziel commented 2 years ago

Oh and one last note – it also crashes in a SharedWorker so we can't use that as a fallback.

adamziel commented 2 years ago

I just got a report that the crash is still happening even in an iframe. Snap!

Edit: It must have been a problem with the Origin-Agent-Cluster: ?1 header that tells the browser to spin another process. The crash doesn't happen with that header in place.

adamziel commented 1 year ago

Google Chrome team have been very responsive and released a series of patches since I first reported this problem (private issue 1372621 for posterity). In my latest round of testing I haven't been able to crash the browser. Woohooo!

I'm closing this issue for now. Let's reopen if the problem ever returns.