bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.36k stars 1.3k forks source link

The `__deregister_frame` is slow on macos #6541

Open TimonPost opened 1 year ago

TimonPost commented 1 year ago

We have a precompiled wasm module of about 337,2MB that if dropped the UnwindRegistration::drop takes around ~22 secs (can quite vary) to finish. Sometimes it takes over 100 seconds. I did some benchmarking and we have 266664 unwind registers and the time per frame deregister is for me around 0.0843227 ms.

This dropping happens when a wasmtime::Module drops.

Three runs:

Deregister 266664 frames in 22485 ms (84.3227 microsec/frame)
Deregister 266664 frames in 22448 ms (84.1817 microsec/frame)
Deregister 266664 frames in 22232 ms (83.3742 microsec/frame)

A fast deregister/register time is vital to us. When hot-reloading after cargo watch detects a change when a single code line changes it used to take 2 secs, but somewhere in last few months, this has somehow gotten very slow. Is there an easy solution here or a solution?

This can be very easy reproduced:

#[test]
fn deserialize_test_memory() {
    let engine = Engine::new(&Config::new().epoch_interruption(true)).unwrap();

    let path = "path-to-precomipled-bytes";

    unsafe {
        let deserialized: wasmtime::Module = Module::deserialize_file(&engine, path).unwrap();

        let time = Instant::now();
        println!("Dropping module...");
        drop(deserialized);
        println!("Dropped module {:?}", time.elapsed());
    }
}
fitzgen commented 1 year ago

Unfortunately there is nothing we can do to speed up the implementation of system libraries.

You can avoid the need to interact with libunwind by disabling (optional but enabled-by-default) native unwind info: https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.native_unwind_info

This configuration option only exists to help third-party stack capturing mechanisms, such as the system’s unwinder or the backtrace crate, determine how to unwind through Wasm frames. It does not affect whether Wasmtime can capture Wasm backtraces or not.

TimonPost commented 1 year ago

Can the behavior of frame deregistration be decoupled from the Drop trait and be optionally triggered in a controlled manner? For example, what if wasmtime allows one to manually fetch 'unwind frames' and allow manually disposing of them. Or what would happen if the frames were pushed to a separate thread for deregistration? Are there any potential downsides or considerations to keep in mind when pursuing this approach?

While I understand the typical approach of performing cleanup on drop, having more control over the process would be really nice to have control over this. As at the moment it is very implicit behavior and it took me a while to pinpoint this exact function deep down the code stack. And also, disabling the behavior will work but has its own downsides that we loose some unwinding information, which ideally we would want to preserve.

bjorn3 commented 1 year ago

Deregistration of unwind info needs to happen before the memory of the module is deallocated to prevent dangling pointers. You could move the Module to a background thread before dropping it to avoid blocking on deregistration of unwind info. This has the risk of runaway memory usage if you load new modules faster than they can be dropped however.

jameysharp commented 1 year ago

I may not have understood the details here, but if I did, I think @bjorn3's suggestion is the best you can do if you must use the platform-native unwinding support. To avoid unbounded memory usage I think you could use sync_channel to create a bounded queue that blocks the main threads if the background thread falls behind on dropping unused modules.

TimonPost commented 1 year ago

When i search for this native_unwind_infoconfiguration option it seems to never be used, only validated. It looks like the PR that introduced this option removed this line https://github.com/bytecodealliance/wasmtime/pull/4643/files#diff-908a422cef87524478865a23918af79040c970e02a3cc6862f6eb508710606f9L1436, which registered the option in cranelift compiler. When I add this manually to my config I do get 0 unwind registers for macos, if i don't do that, I still get all my unwind frames being inserted.

TimonPost commented 1 year ago

Opened A PR #6547, let me know if this assumption is correct, that at the moment it isn't possible to disable unwind frames. From my local debugging with this branch, it does look like this drastically improves unloading from +20 secs to a few micro secs.