emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.4k stars 3.26k forks source link

WasmFS: Reading from fetch backend in a thread doesn't work #17908

Open kd935 opened 1 year ago

kd935 commented 1 year ago

Fetch backend runs in a separate thread. So when reading from another thread it will try to communicate with backend thread. But because the communication is done through the main thread, if main is busy, other threads can’t communicate. The program just hangs. Not sure if this can be considered as a bug.

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.23-git (84ad56fdb8acbaf07178998bdbb412ecb70e8267)
clang version 16.0.0 (https://github.com/llvm/llvm-project 9879261d7a061cc1df18f6fe164f42044fb4cff1)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: D:\emsdk\3.1.20\upstream\bin

Failing command line in full:

#include <fstream>
#include <filesystem>
#include <cassert>
#include <iostream>
#include <unistd.h>
#include <emscripten/wasmfs.h>
#include <string>
#include <thread>

int main()
{
  // File contents - "hello"
  const char *filePath = "/file.txt";
  const char *url = "http://localhost:8000/file.txt";

  backend_t backend = wasmfs_create_fetch_backend(url);
  int fd = wasmfs_create_file(filePath, S_IRWXU | S_IRWXG | S_IRWXO, backend);
  assert(fd);

  std::thread t ([=]()
   {
    std::ifstream input(filePath);
    std::string text;
    input >> text;
    std::cout << text << std::endl;
   });

  t.join();

  assert(close(fd) == 0);
  assert(std::filesystem::remove(filePath));

  std::cout << "Finished!" << std::endl;

   return 0;
}

Full link command and output with -v appended:

``` D:\emscripten>em++ -v -std=c++17 D:/build/test.cpp -o D:/build/test.html -s EXIT_RUNTIME=1 -s WASMFS=1 -s USE_PTHREADS=1 -sPTHREAD_POOL_SIZE=2 "D:/emsdk/3.1.20/upstream/bin\clang.exe" --version "D:/emsdk/3.1.20/upstream/bin\clang++.exe" -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -D__EMSCRIPTEN_SHARED_MEMORY__=1 -DEMSCRIPTEN -ID:\emscripten\cache\sysroot\include\SDL --sysroot=D:\emscripten\cache\sysroot -Xclang -iwithsysroot/include\compat -v -std=c++17 -pthread D:/build/test.cpp -c -o C:\Users\kdots\AppData\Local\Temp\emscripten_temp_rxxb846o\test_0.o clang version 16.0.0 (https://github.com/llvm/llvm-project 9879261d7a061cc1df18f6fe164f42044fb4cff1) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: D:\emsdk\3.1.20\upstream\bin (in-process) "D:\\emsdk\\3.1.20\\upstream\\bin\\clang++.exe" -cc1 -triple wasm32-unknown-emscripten -emit-obj -mrelax-all --mrelax-relocations -disable-free -clear-ast-before-backend -main-file-name test.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-feature +atomics -target-feature +bulk-memory -target-feature +mutable-globals -target-feature +sign-ext -target-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -v "-fcoverage-compilation-dir=D:\\emscripten" -resource-dir "D:\\emsdk\\3.1.20\\upstream\\lib\\clang\\16.0.0" -D __EMSCRIPTEN_SHARED_MEMORY__=1 -D EMSCRIPTEN -I "D:\\emscripten\\cache\\sysroot\\include\\SDL" -isysroot "D:\\emscripten\\cache\\sysroot" -internal-isystem "D:\\emscripten\\cache\\sysroot/include/wasm32-emscripten/c++/v1" -internal-isystem "D:\\emscripten\\cache\\sysroot/include/c++/v1" -internal-isystem "D:\\emsdk\\3.1.20\\upstream\\lib\\clang\\16.0.0\\include" -internal-isystem "D:\\emscripten\\cache\\sysroot/include/wasm32-emscripten" -internal-isystem "D:\\emscripten\\cache\\sysroot/include" -std=c++17 -fdeprecated-macro "-fdebug-compilation-dir=D:\\emscripten" -ferror-limit 19 -fmessage-length=176 -fvisibility=default -pthread -fgnuc-version=4.2.1 -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics "-iwithsysroot/include\\compat" -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o "C:\\Users\\kdots\\AppData\\Local\\Temp\\emscripten_temp_rxxb846o\\test_0.o" -x c++ D:/build/test.cpp clang -cc1 version 16.0.0 based upon LLVM 16.0.0git default target x86_64-pc-windows-msvc ignoring nonexistent directory "D:\emscripten\cache\sysroot/include/wasm32-emscripten/c++/v1" ignoring nonexistent directory "D:\emscripten\cache\sysroot/include/wasm32-emscripten" #include "..." search starts here: #include <...> search starts here: D:\emscripten\cache\sysroot\include\SDL D:\emscripten\cache\sysroot/include\compat D:\emscripten\cache\sysroot/include/c++/v1 D:\emsdk\3.1.20\upstream\lib\clang\16.0.0\include D:\emscripten\cache\sysroot/include End of search list. "D:/emsdk/3.1.20/upstream/bin\wasm-ld.exe" -o D:/build/test.wasm C:\Users\kdots\AppData\Local\Temp\emscripten_temp_rxxb846o\test_0.o -LD:\emscripten\cache\sysroot\lib\wasm32-emscripten D:\emscripten\cache\sysroot\lib\wasm32-emscripten\crtbegin.o --whole-archive -lwasmfs-mt-debug --no-whole-archive -lGL-mt -lal -lhtml5 -lstubs-debug -lc-mt-debug -ldlmalloc-mt -lcompiler_rt-mt -lc++-mt-noexcept -lc++abi-mt-noexcept -lsockets-mt -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --import-undefined --import-memory --shared-memory --strip-debug --export-if-defined=main --export-if-defined=_emscripten_thread_init --export-if-defined=_emscripten_thread_exit --export-if-defined=_emscripten_thread_crashed --export-if-defined=_emscripten_tls_init --export-if-defined=pthread_self --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export-if-defined=fflush --export=emscripten_stack_get_end --export=emscripten_stack_get_free --export=emscripten_stack_get_base --export=emscripten_stack_init --export=_wasmfs_read_file --export=stackSave --export=stackRestore --export=stackAlloc --export=__wasm_call_ctors --export=__errno_location --export=emscripten_dispatch_to_thread_ --export=_emscripten_thread_free_data --export=emscripten_main_browser_thread_id --export=emscripten_main_thread_process_queued_calls --export=emscripten_run_in_main_runtime_thread_js --export=emscripten_stack_set_limits --export=emscripten_proxy_finish --export=__get_temp_ret --export=__set_temp_ret --export=__funcs_on_exit --export=malloc --export=free --export=__cxa_is_pointer_type --export-table -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024 "D:/emsdk/3.1.20/upstream\bin\wasm-emscripten-finalize" --dyncalls-i64 --pass-arg=legalize-js-interface-exported-helpers D:/build/test.wasm -o D:/build/test.wasm --detect-features "D:/emsdk/3.1.20/node/14.18.2_64bit/bin/node.exe" D:\emscripten\src\compiler.js C:\Users\kdots\AppData\Local\Temp\tmp5kas3wnu.json "D:/emsdk/3.1.20/upstream/bin\llvm-objcopy.exe" D:/build/test.wasm D:/build/test.wasm --remove-section=.debug* --remove-section=producers "D:/emsdk/3.1.20/node/14.18.2_64bit/bin/node.exe" D:\emscripten\tools\preprocessor.js C:\Users\kdots\AppData\Local\Temp\emscripten_temp_rxxb846o\settings.js worker.js --expandMacros "D:/emsdk/3.1.20/node/14.18.2_64bit/bin/node.exe" D:\emscripten\tools\preprocessor.js C:\Users\kdots\AppData\Local\Temp\emscripten_temp_rxxb846o\settings.js shell.html ```

Expected output:

hello
Finished!

Actual output:

mere-human commented 1 year ago

So, in all FS backends that use ProxyWorker, emscripten_proxy_sync_with_ctx() is called: https://github.com/emscripten-core/emscripten/blob/06a8d7ad0b1a473b3368aea87b9bda134706774f/system/lib/pthread/proxying.c#L457 It calls emscripten_proxy_async() first then it proceeds to waiting for a completion via pthread_cond_wait(). _emscripten_notify_task_queue() is used for submitting async task from a user background thread to a ProxyWorker thread. But the problem is that it's done through the main thread via postMessage and the main thread is busy: https://github.com/emscripten-core/emscripten/blob/06a8d7ad0b1a473b3368aea87b9bda134706774f/src/library_pthread.js#L1015 If two threads could communicate directly that wouldn't be a problem.

kripken commented 1 year ago

@tlively is working on this right now. There is a hack here that you can use for now: https://github.com/emscripten-core/emscripten/pull/17869 Otherwise, we have some ideas for how to do a proper fix, using either waitAsync and/or MessageChannels (avoiding postMessage through the main loop that way).

kripken commented 1 year ago

(Yes, the key thing is to get direct thread communication, good summary @mere-human !)