bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.09k stars 1.26k forks source link

Cranelift: timing module for `cranelift-codegen` uses too much TLS space #3654

Open kdy1 opened 2 years ago

kdy1 commented 2 years ago

.clif Test Case

I think this is not relevant because it's an issue about thread-local usage.

https://github.com/bytecodealliance/wasmtime/blob/8043c1f919a77905255eded33e4e51a6fbfd1de1/cranelift/codegen/src/timing.rs#L178-L181

This is the issue

Steps to Reproduce

  1. git clone https://github.com/swc-project/swc.git
  2. cd swc
  3. git checkout d901b6222f8a77beed64968fcb4764bbacf8c755
  4. yarn build
  5. yarn test

Expected Results

Tests should success

Actual Results

node.js fails to load the swc node binding, because the node binding is using too much TLS space.

/swc/swc.linux-x64-gnu.node: cannot allocate memory in static TLS block

Versions and Environment

Cranelift version or commit: Current master (https://github.com/bytecodealliance/wasmtime/commit/8043c1f919a77905255eded33e4e51a6fbfd1de1)

Operating system: linux with glibc

Architecture: gnu

Extra Info

Is it fine to add a cargo feature like disable-timing?

bjorn3 commented 2 years ago

Can you run llvm-objdump -C -t /path/to/swc.node and then grep for .tdata or .tbss? This should make it clear which thread local variables are responsible for the excessive TLS usage. The timing module uses less than the maximum of 2048 bytes.

https://fasterthanli.me/articles/a-dynamic-linker-murder-mystery

kdy1 commented 2 years ago

Of course! Sorry, I thought I posted it.


$ readelf -Wl swc.linux-x64-gnu.node | grep -E 'PhysAddr|TLS'

  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  TLS            0x2028980 0x0000000002029980 0x0000000002029980 0x0000b0 0x00084d R   0x10

$ llvm-objdump -C -t swc.linux-x64-gnu.node | grep -F '.tdata'

0000000002029980 l    d  .tdata 00000000 .tdata
0000000000000078 l     O .tdata 00000028 tracing_core::dispatcher::CURRENT_STATE::__getit::__KEY::h1e739189ed69442f
0000000000000060 l     O .tdata 00000018 sharded_slab::tid::REGISTRATION::__getit::__KEY::h361b7be9b3025982
00000000000000a0 l     O .tdata 00000001 backtrace::lock::LOCK_HELD::__getit::__KEY::h3cc357ac3fd8018a (.0.0)
00000000000000a1 l     O .tdata 00000001 cranelift_codegen::timing::details::CURRENT_PASS::__getit::__KEY::h06255f6d1e918b18 (.0.0)
0000000000000000 l     O .tdata 00000030 parking_lot_core::parking_lot::with_thread_data::THREAD_DATA::__getit::__KEY::h03e0eec2789cad6f
0000000000000030 l     O .tdata 00000028 std::sys_common::thread_info::THREAD_INFO::__getit::VAL::h7220d9cfd6f7a9c4
00000000000000a8 l     O .tdata 00000008 _mi_heap_default

$ llvm-objdump -C -t swc.linux-x64-gnu.node | grep -F '.tbss'

0000000002029a30 l    d  .tbss  00000000 .tbss
0000000000000660 l     O .tbss  00000018 swc_common::syntax_pos::GLOBALS::FOO::__getit::__KEY::h9429b619b9c3b9a4
0000000000000778 l     O .tbss  00000018 rayon_core::registry::WORKER_THREAD_STATE::__getit::__KEY::h858fe9d7bafa379a
0000000000000790 l     O .tbss  00000028 rayon_core::registry::Registry::in_worker_cold::LOCK_LATCH::__getit::__KEY::h697c1168d92c27b8
0000000000000750 l     O .tbss  00000028 tracing_subscriber::filter::layer_filters::FILTERING::__getit::__KEY::hfcf018e992ac8c34
0000000000000698 l     O .tbss  00000030 thread_local::thread_id::THREAD_HOLDER::__getit::__KEY::h4ca3716321184c00
00000000000006d0 l     O .tbss  00000030 tracing_subscriber::filter::env::SCOPE::__getit::__KEY::h7643333f5d409469
0000000000000700 l     O .tbss  00000018 tracing_subscriber::registry::sharded::CLOSE_COUNT::__getit::__KEY::h6de4dd54e98564a1
00000000000005b8 l     O .tbss  00000018 regex::pool::THREAD_ID::__getit::__KEY::h15dd10060d91dde1
0000000000000680 l     O .tbss  00000018 swc_ecma_transforms_base::helpers::HELPERS::FOO::__getit::__KEY::h8bfa7f8e1f9e3276
0000000000000720 l     O .tbss  00000030 _$LT$tracing_subscriber..fmt..fmt_layer..Layer$LT$S$C$N$C$E$C$W$GT$$u20$as$u20$tracing_subscriber..layer..Layer$LT$S$GT$$GT$::on_event::BUF::__getit::__KEY::h95b197147e77bb07
00000000000007b8 l     O .tbss  00000018 crossbeam_epoch::default::HANDLE::__getit::__KEY::h0c1b98d9dd2a21b8
0000000000000820 l     O .tbss  00000020 std::collections::hash::map::RandomState::new::KEYS::__getit::__KEY::h8c747f1f3fba03b2
0000000000000640 l     O .tbss  00000018 swc_common::errors::HANDLER::FOO::__getit::__KEY::hdbd2b92021199add
00000000000000b0 l     O .tbss  000004b8 cranelift_codegen::timing::details::PASS_TIME::__getit::__KEY::hf5cf71b784ccc3ac
0000000000000570 l     O .tbss  00000048 napi::bindgen_runtime::module_register::REGISTERED_CLASSES::__getit::__KEY::hb4af2239b740d668
00000000000005d0 l     O .tbss  00000018 std::io::stdio::OUTPUT_CAPTURE::__getit::__KEY::hf77b7ab11412dca1
0000000000000608 l     O .tbss  00000001 std::sys_common::thread_info::THREAD_INFO::__getit::STATE::h8d316da519a744b0 (.0.0)
00000000000005f0 l     O .tbss  00000018 std::panicking::panic_count::LOCAL_PANIC_COUNT::__getit::__KEY::h00a600e6cc6ae43d
0000000000000610 l     O .tbss  00000028 swc_common::errors::TRACK_DIAGNOSTICS::__getit::__KEY::h47e578a9e812e486
0000000000000840 l     O .tbss  0000000c wast::resolve::gensym::NEXT::__getit::__KEY::h427fbcdc6e5f05e2
00000000000007d0 l     O .tbss  00000020 wasmer_vm::trap::traphandlers::tls::raw::PTR::__getit::__KEY::hd09811897ad75770
00000000000007f0 l     O .tbss  00000030 wasmer_vm::trap::traphandlers::lazy_per_thread_init::TLS::__getit::__KEY::h620135b9ef0504b6
000000000000084c l     O .tbss  00000001 recurse
bjorn3 commented 2 years ago

I see. Cranelift indeed uses a lot of tls space. Still it feels like this should be fixed in nodejs or napi and not in cranelift. Swc is compiled as dynamic library that is dlopened, so it should use dynamic tls and not static tls. Static tls is only allowed for the main executable and the dynamic libraries declared as dependencies of the main executable. The fact that it works at all in some cases is because glibc reserved another 1024 bytes just in case someone dlopens a dynamic library compiled with static tls instead of dynamic tls.

After I wrote the above text I did some digging and it seems like zig cc which is used by napi passes -ftls-model=initial-exec: https://cs.github.com/ziglang/zig/blob/5c228765f1094d30e64d13c0077c67b2867ecd6a/src/glibc.zig?q=tls-model%20language%3AZig#L325 Patching this away may fix the issue for you. Edit: or is that just for building a static copy of glibc itself?

kdy1 commented 2 years ago

How do you think about a cargo feature to strip out timing facilities? Actually, I don't need it at all.

bjorn3 commented 2 years ago

I personally think that is fine to add. I just think the tls problem should be solved at it's root to prevent anyone else from being caught by this or it regression again in the future due to some other crate or changes in the standard library or optimizations.