containers / krunvm

Create microVMs from OCI images
Apache License 2.0
1.43k stars 42 forks source link

UnexpectedEof exception #12

Closed monken closed 2 years ago

monken commented 3 years ago

On a new ubuntu VM I'm running nvm install 14 which compiles nodejs from source. After a while, the VM crashes with:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: DecodeMessage(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })', src/devices/src/virtio/fs/device.rs:176:18
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: devices::virtio::fs::device::Fs::process_queue
   4: devices::virtio::fs::event_handler::<impl polly::event_manager::Subscriber for devices::virtio::fs::device::Fs>::process
   5: _krun_start_enter
   6: krunvm::start::start
   7: krunvm::main
slp commented 3 years ago

I'll try to reproduce the issue here. Was this on macOS or Linux?

monken commented 3 years ago

This was on a MacOS. It happens intermittently and at different stages of the compilation. If I would guess I’d say it has to do with very long commands maybe?

tjfontaine commented 2 years ago

I just hit this on macOS (Ventura) as well with krunvm 0.2.2

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: DecodeMessage(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })', src/devices/src/virtio/fs/device.rs:183:18
stack backtrace:
   0:        0x104b6dcac - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9e1f971b6c458057
   1:        0x104a853f4 - core::fmt::write::hff50bf5ab34a8e88
   2:        0x104b5d244 - std::io::Write::write_fmt::h4b4cac536910ac0e
   3:        0x104b5eca4 - std::panicking::default_hook::{{closure}}::h6afa39fd0b64edb8
   4:        0x104b5e978 - std::panicking::default_hook::h0ab5dc1706bc4227
   5:        0x104b5f4dc - std::panicking::rust_panic_with_hook::h4f2beaf7e17e9f84
   6:        0x104b6e3bc - std::panicking::begin_panic_handler::{{closure}}::h7bfd5963e591c8e8
   7:        0x104b6e334 - std::sys_common::backtrace::__rust_end_short_backtrace::ha8dd13728d5f8e85
   8:        0x104b5f0cc - _rust_begin_unwind
   9:        0x104b9a1c0 - core::panicking::panic_fmt::hd0caa445cceef50a
  10:        0x104b9a2b0 - core::result::unwrap_failed::h5ff0ededcb2e4d28
  11:        0x104ad5e4c - devices::virtio::fs::device::Fs::process_queue::ha2db8ee557e49a80
  12:        0x104acbf60 - devices::virtio::fs::event_handler::<impl polly::event_manager::Subscriber for devices::virtio::fs::device::Fs>::process::hecce4c45fbf8ea11
  13:        0x104af3edc - _krun_start_enter
  14:        0x10453a678 - krunvm::start::start::h7911a88a477697d7
  15:        0x104546b88 - krunvm::main::h37de1de20aab3681
  16:        0x10454e328 - std::sys_common::backtrace::__rust_begin_short_backtrace::h48dc07bb8bc9954a
  17:        0x1045391e4 - std::rt::lang_start::{{closure}}::h9b802ef6a1a5683e
  18:        0x1045a8904 - std::rt::lang_start_internal::h4191a76eb3bd68f6
  19:        0x104548224 - _main

The config I’ve been using is

buildvm
 CPUs: 8
 RAM (MiB): 8192
 DNS server: 1.1.1.1
 Buildah container: ubuntu-working-container
 Workdir: 
 Mapped volumes: {}
 Mapped ports: {}

and within the environment I’m doing approximately apt source ffmpeg && apt build-dep ffmpeg && debuild -b -uc -us

/Volumes/Krunvm is backed by a sparsebundle in case that is valuable information as well.

slp commented 2 years ago

@tjfontaine Thanks for reporting this. I was finally able to reproduce the issue on my M1 by following your steps.

Given the nature of the issue, and Apple Silicon's ability to discover hidden concurrency issues due to its aggressive OoO execution, I suspect we're missing a memory barrier somewhere. I'll try to hunt it down this week.

slp commented 2 years ago

This was, indeed, a concurrency issue (actually two, on in queue.rs and the other is Rust's own std::sync::mpsc). The tests with debuild also revealed another in the virtio-fs implementation.

All of those issues should be fixed after updating to libkrun v1.4.4, which is already available in the Homebrew Tap.

Please let me know if, after updating, you're still able to reproduce this problem.