cloud-hypervisor / cloud-hypervisor

A Virtual Machine Monitor for modern Cloud workloads. Features include CPU, memory and device hotplug, support for running Windows and Linux guests, device offload with vhost-user and a minimal compact footprint. Written in Rust with a strong focus on security.
https://www.cloudhypervisor.org
3.89k stars 439 forks source link

Ubuntu VM doesn’t complete reboot due DXE ResetSystem2 event #6601

Closed rjsilvams closed 2 months ago

rjsilvams commented 2 months ago

Describe the bug A clear and concise description of what the bug is.

This issue is related to this previous one Fedora VM hangs on boot · Issue #6439 · cloud-hypervisor/cloud-hypervisor (github.com)).

I work on the same team with maburlik and now we have more details and debug information.

After reboot (regardless of the method used i.e., reset API, "shutdown -r now" command, SYSRQ), the VM does not complete the boot process due to a reset event triggered by the DXE Service of the guest operating system (Ubuntu).

I am using cloud-hypervisor v38.0 with Default Boot Firmware: OVMF-CLOUDHV-DEBUG.fd and verbose tracing enabled.

cloud-hypervisor logs: cloud-hypervisor-reboot-dxe-issue-logs.zip

To Reproduce Launch 1 VM and reboot it (e.g., shutdown -r now)

Steps to reproduce the behaviour:

Version

Output of cloud-hypervisor --version: cloud-hypervisor v38.0

Did you build from source, if so build command line (e.g. features): Yes, built from source from tag v38.0

VM configuration

What command line did you run (or JSON config data): cloud-hypervisor: 57.762084ms: <vmm> INFO:vmm/src/vm.rs:502 -- Booting VM from config: Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 2, max_vcpus: 2, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures { amx: false } }, memory: MemoryConfig { size: 1073741824, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None, thp: true }, payload: Some(PayloadConfig { firmware: None, kernel: Some("/home/johndoe/repo/product-compute/out/release/product/bin/vmmd.code/OVMF-CLOUDHV-DEBUG.fd"), cmdline: None, initramfs: None }), rate_limit_groups: None, disks: Some([DiskConfig { path: Some("/home/johndoe/repo/product-compute/out/src/test/e2e_test/my_test/local_storage/f0055e01-9060-4ea9-ac8d-69ec4373dccf/vm_ssh_test_1_work/ubuntu_reboot.vm_ssh_test_1.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, rate_limit_group: None, rate_limiter_config: None, id: None, disable_io_uring: false, disable_aio: false, pci_segment: 0, serial: None, queue_affinity: None }, DiskConfig { path: Some("/home/johndoe/repo/product-compute/out/src/test/e2e_test/my_test/local_storage/f0055e01-9060-4ea9-ac8d-69ec4373dccf/vm_ssh_test_1_work/cloud-init/nocloud_ds.iso"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, rate_limit_group: None, rate_limiter_config: None, id: None, disable_io_uring: false, disable_aio: false, pci_segment: 0, serial: None, queue_affinity: None }]), net: Some([NetConfig { tap: Some("product_ptap_5e64"), ip: 192.168.249.1, mask: 255.255.255.0, mac: MacAddr { bytes: [82, 115, 177, 72, 56, 178] }, host_mac: None, mtu: None, iommu: false, num_queues: 2, queue_size: 256, vhost_user: false, vhost_socket: None, vhost_mode: Client, id: None, fds: None, rate_limiter_config: None, pci_segment: 0, offload_tso: true, offload_ufo: true, offload_csum: true }]), rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "/secrets", socket: "/tmp/vmmd_sockets/virtiofsd-socket-0", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }, FsConfig { tag: "product_vm_diagnostics", socket: "/tmp/vmmd_sockets/virtiofsd-socket-1", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Tty, iommu: false, socket: None }, console: ConsoleConfig { file: None, mode: Off, iommu: false, socket: None }, debug_console: DebugConsoleConfig { file: None, mode: Off, iobase: Some(233) }, devices: None, user_devices: None, vdpa: None, vsock: Some(VsockConfig { cid: 3, socket: "/tmp/vmmd_sockets/product-vm-guest-socket-3.vsock", iommu: false, id: None, pci_segment: 0 }), pvpanic: false, iommu: false, sgx_epc: None, numa: None, watchdog: true, platform: None, tpm: None, preserved_fds: None }, poisoned: false, .. }

Guest OS version details: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

Host OS version details:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Logs Shown above.

Output of cloud-hypervisor -v from either standard error or via --log-file:

Linux kernel output:

Additional Information

1. Log Error Patterns If you find these patterns, then you have the same problem described here.

The following log patterns in chv*.log indicate that the problem has occurred:

[   73.313075] reboot: Restarting system
DXE ResetSystem2: ResetType Warm, Call Depth = 1.
cloud-hypervisor: 468.186179s: <vcpu0> INFO:devices/src/legacy/i8042.rs:45 -- i8042 reset signalled
cloud-hypervisor: 468.186197s: <vmm> INFO:vmm/src/lib.rs:1133 -- VM reset event

2. Related Issue Note In my environment, the error reported in the issue Fedora VM hangs on boot · Issue #6439 · cloud-hypervisor/cloud-hypervisor (github.com)). seems to be a side effect issue. Basically, the event “ERROR:virtio-devices/src/vhost_user/vu_common_ctrl.rs:412 -- Failed connecting the backend after trying for 1 minute” is only the 3rd problem in a sequence of failures. I have put together a diagram that shows my interpretation of what happens in the system looking at logs collected from both cloud-hypervisor and virtiofsd. Note that we use a wrapper on top of cloud-hypervisor.

3. Sequence Diagram Seq Diagram - v3

4. Diagram Analysis

a. After the VM is created, we reboot the VM using either one of the following methods: a) Invoke the cloud-hypervisor reset API; b) Log in to the VM and execute: shutdown -r now; c) echo 1 | tee /proc/sys/kernel/sysrq && echo b | tee /proc/sysrq-trigger b. cloud-hypervisor starts the process to boot the VM, but the process does not complete because the guest operating system (Ubuntu) is triggering a reboot. This reboot is reported in the cloud-hypervisor via a log message following the pattern: [ 73.313075] reboot: Restarting system c. Such reboot is issued due to a failure detected by the DXE service, which is confirmed by cloud-hypervisor via the log message: DXE ResetSystem2: ResetType Warm, Call Depth = 1. This message is produced by the following code: https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/ResetSystemRuntimeDxe/ResetSystem.c, function = RuntimeServiceResetSystem. d. The DXE service checks and alters memory protection attributes during boot via EFISTUB to ensure that memory ranges used by the kernel are writable and executable. e. Upon request from the DXE Service (i.e., guest VM operating system), cloud-hypervisor is rebooting the VM using the, which is confirmed by the log: INFO:devices/src/legacy/i8042.rs:45 -- i8042 reset signalled. f. After this reboot, triggered by the DXE Service, the VM is unable to complete its second boot attempt, and a cloud-hypervisor shutdown event is generated. Once this occurs, we noticed that virtiofsd didn't report any disconnect. In fact, what we see is cloud-hypervisor completely lost trying to connect again. At this moment, if a shutdown event is reported, we probably shouldn't be attempting a reconnection, which is indicated via the messages: cloud-hypervisor: 659.614755s: <_fs3> INFO:virtio-devices/src/vhost_user/vu_common_ctrl.rs:390 -- Connecting to vhost-user backend at socket path: /tmp/vmmd_sockets/virtiofsd-socket-0 g. This reconnection attempts fail with the error "connection refused" since the virtiofsd socket is not in a listening state ready to accept connections. After trying for 1 minute, it bails out and reports the log message "cloud-hypervisor: 719.797326s: <_fs3> ERROR:virtio-devices/src/vhost_user/vu_common_ctrl.rs:412 -- Failed connecting the backend after trying for 1 minute: VhostUserProtocol(SocketConnect(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })). socket path: /tmp/vmmd_sockets/virtiofsd-socket-0".

5. Suspect Cloud-Hypervisor Logs I found a few differences while comparing the logs of the good run and bad run just before the DXE driven “reset event”.

a. Frequency of IOAPIC Register Access (within the same time window) -Log Message: DEBUG:devices/src/ioapic.rs:254 -- IOAPIC_W reg 0x -Good_Run: 136 times -Bad_Run: 271 times

b. Frequency of “Signaling” events (within the same time window) Looking at the count of the following messages from the moment the VM first comes up until the reboot process either completes or fails, there’s a difference.

rbradford commented 2 months ago

I apologise - i'm struggling to follow your description here - there is almost too much detail. Are you saying that when you trigger a reboot (OS driven e.g. "sudo reboot") then during the boot process there is a second reboot (which you have not requested) before the VM is fully booted?

rjsilvams commented 2 months ago

I apologise - i'm struggling to follow your description here - there is almost too much detail. Are you saying that when you trigger a reboot (OS driven e.g. "sudo reboot") then during the boot process there is a second reboot (which you have not requested) before the VM is fully booted?

Yes, Robert. That's exactly what is happening. This second "unexpected/unintended" reboot is being triggered by the DXE ResetSystem2.

rbradford commented 2 months ago

I apologise - i'm struggling to follow your description here - there is almost too much detail. Are you saying that when you trigger a reboot (OS driven e.g. "sudo reboot") then during the boot process there is a second reboot (which you have not requested) before the VM is fully booted? Can you attach dumps without the debug logging interleaved with the output - its hard to see.

Yes, Robert. That's exactly what is happening. This second "unexpected/unintended" reboot is being triggered by the DXE ResetSystem2.

The thing is "[ 73.313075] reboot: Restarting system" is from the kernel - so the OS has requested the reboot. Since you're running with EFI firmware the kernel will call into EFI to make the request to reset the system.

If there is something wrong with the EFI implementation you might be able to use "reboot=a" on the kernel commandline to force the reboot to use ACPI.

Can you reproduce this outside your environment and without virtiofs?

rjsilvams commented 2 months ago

I apologise - i'm struggling to follow your description here - there is almost too much detail. Are you saying that when you trigger a reboot (OS driven e.g. "sudo reboot") then during the boot process there is a second reboot (which you have not requested) before the VM is fully booted? Can you attach dumps without the debug logging interleaved with the output - its hard to see.

Yes, Robert. That's exactly what is happening. This second "unexpected/unintended" reboot is being triggered by the DXE ResetSystem2.

The thing is "[ 73.313075] reboot: Restarting system" is from the kernel - so the OS has requested the reboot. Since you're running with EFI firmware the kernel will call into EFI to make the request to reset the system.

If there is something wrong with the EFI implementation you might be able to use "reboot=a" on the kernel commandline to force the reboot to use ACPI.

Can you reproduce this outside your environment and without virtiofs?

I cannot reproduce this without virtiofs because it's an integral part of our system and it cannot be easily removed. However, I will try the reboot=a as you suggested and let you know. Thanks :)

rbradford commented 2 months ago

I apologise - i'm struggling to follow your description here - there is almost too much detail. Are you saying that when you trigger a reboot (OS driven e.g. "sudo reboot") then during the boot process there is a second reboot (which you have not requested) before the VM is fully booted? Can you attach dumps without the debug logging interleaved with the output - its hard to see.

Yes, Robert. That's exactly what is happening. This second "unexpected/unintended" reboot is being triggered by the DXE ResetSystem2.

The thing is "[ 73.313075] reboot: Restarting system" is from the kernel - so the OS has requested the reboot. Since you're running with EFI firmware the kernel will call into EFI to make the request to reset the system. If there is something wrong with the EFI implementation you might be able to use "reboot=a" on the kernel commandline to force the reboot to use ACPI. Can you reproduce this outside your environment and without virtiofs?

I cannot reproduce this without virtiofs because it's an integral part of our system and it cannot be easily removed. However, I will try the reboot=a as you suggested and let you know. Thanks :)

Ok, please do send the CH output without the debug interleaved (or maybe just one "-v" if that is still readable). "-vv" is just unreadable unfortunately.

rjsilvams commented 2 months ago

Hi Robert. After some investigation, I concluded that the mysterious reboot associated with the message "[ 73.313075] reboot: Restarting system" was triggered by one of our tests scripts.

In other words, the issue, as I reported, does not exist.

rjsilvams commented 2 months ago

Hi Robert. After some investigation, I concluded that the mysterious reboot associated with the message "[ 73.313075] reboot: Restarting system" was triggered by one of our tests scripts.

In other words, the issue, as I reported, does not exist.