cloud-hypervisor / rust-hypervisor-firmware

Apache License 2.0
608 stars 57 forks source link

Unable to boot Ubuntu 22.04 LTS (cloud image) on arm64 #298

Closed edigaryev closed 4 days ago

edigaryev commented 12 months ago

How to reproduce

#!/bin/bash

set -euo pipefail

QCOW2_NAME="jammy-server-cloudimg-arm64.img"

if [ ! -e "${QCOW2_NAME}" ]
then
    wget -O "${QCOW2_NAME}" https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-arm64.img
fi

RAW_NAME="jammy-server-cloudimg-arm64.raw"

if [ ! -e "${RAW_NAME}" ]
then
    qemu-img convert -p -f qcow2 -O raw "${QCOW2_NAME}" "${RAW_NAME}"
fi

if [ ! -d "rust-hypervisor-firmware" ]
then
    git clone https://github.com/cloud-hypervisor/rust-hypervisor-firmware.git
fi

FIRMWARE_PATH="rust-hypervisor-firmware/target/aarch64-unknown-none/release/hypervisor-fw"

if [ ! -e "${FIRMWARE_PATH}" ]
then
    pushd rust-hypervisor-firmware
    cargo build --release --target aarch64-unknown-none.json -Zbuild-std=core,alloc -Zbuild-std-features=compiler-builtins-mem
    popd
fi

cloud-hypervisor --serial tty --console pty --kernel "${FIRMWARE_PATH}" --disk path="${RAW_NAME}"

Expected output

The VM boots and shows ubuntu login:.

Actual output

$ ./run.sh
[...]
cloud-hypervisor: 8.063405ms: <vmm> WARN:arch/src/aarch64/fdt.rs:114 -- File: /sys/devices/system/cpu/cpu0/cache/index3/size does not exist.
cloud-hypervisor: 8.138801ms: <vmm> WARN:arch/src/aarch64/fdt.rs:148 -- File: /sys/devices/system/cpu/cpu0/cache/index3/coherency_line_size does not exist.
cloud-hypervisor: 8.187810ms: <vmm> WARN:arch/src/aarch64/fdt.rs:171 -- File: /sys/devices/system/cpu/cpu0/cache/index3/number_of_sets does not exist.
cloud-hypervisor: 8.266878ms: <vmm> WARN:arch/src/aarch64/fdt.rs:428 -- L2 cache shared with other cpus

Booting with FDT
Found PCI device vendor=8086 device=d57 in slot=0
Found PCI device vendor=1af4 device=1043 in slot=1
Found PCI device vendor=1af4 device=1042 in slot=2
Found PCI device vendor=1af4 device=1044 in slot=3
PCI Device: 0:2.0 1af4:1042
Bar: type=MemorySpace32 address=0x2ff80000 size=0x80000
Bar: type=MemorySpace32 address=0x0 size=0x0
Bar: type=MemorySpace32 address=0x0 size=0x0
Bar: type=MemorySpace32 address=0x0 size=0x0
Bar: type=MemorySpace32 address=0x0 size=0x0
Bar: type=MemorySpace32 address=0x0 size=0x0
Updated BARs: type=MemorySpace32 address=2ff80000 size=80000
Updated BARs: type=MemorySpace32 address=0 size=0
Updated BARs: type=MemorySpace32 address=0 size=0
Updated BARs: type=MemorySpace32 address=0 size=0
Updated BARs: type=MemorySpace32 address=0 size=0
Updated BARs: type=MemorySpace32 address=0 size=0
Virtio block device configured. Capacity: 4612096 sectors
Found EFI partition
Filesystem ready
Error loading default entry: File(NotFound)
Using EFI boot.
Found bootloader: \EFI\BOOT\BOOTAA64.EFI
Executable loaded
Failed to set MokListRT: Unsupported
TPM logging failed: Unsupported
Could not create MokListRT: Unsupported
Failed to set MokListXRT: Unsupported
TPM logging failed: Unsupported
Could not create MokListXRT: Unsupported
TPM logging failed: Unsupported
Could not create MokListTrustedRT: Unsupported
Something has gone seriously wrong: import_mok_state() failed: Unsupported
TPM logging failed: Unsupported

The VM hangs and 100% CPU usage by Cloud Hypervisor process can be observed.

Versions tested

Rust Hypervisor Firmware built from main, and:

$ cloud-hypervisor --version
cloud-hypervisor v36.0.0

Hardware used

a1.metal AWS EC2 instance running Debian 12 (arm64).

Notes

EDK2 works is just fine.

Related: https://github.com/cloud-hypervisor/rust-hypervisor-firmware/issues/198.

retrage commented 11 months ago

I tried to reproduce it on Neoverse-N1, with current (20231207) Ubuntu 22.04 Jammy cloud image (sha256sum: d74dc6f9bc92da4dff973bab1b6dab411c7b6a5219fcdbec25413832cb4b23ba), but cannot. Here is a part of the log:

Found bootloader: \EFI\BOOT\BOOTAA64.EFI                                        
Executable loaded                                                               
Failed to set MokListRT: Unsupported                                            
TPM logging failed: Unsupported                                                 
Could not create MokListRT: Unsupported                                         
Failed to set MokListXRT: Unsupported                                           
TPM logging failed: Unsupported                                                 
Could not create MokListXRT: Unsupported                                        
TPM logging failed: Unsupported                                                 
Could not create MokListTrustedRT: Unsupported                                  
Something has gone seriously wrong: import_mok_state() failed: Unsupported      
TPM logging failed: Unsupported                                                 
Could not create variable: Unsupported                                          
Failed to set MokListRT: Unsupported                                            
TPM logging failed: Unsupported                                                 
Could not create MokListRT: Unsupported                                         
Failed to set MokListXRT: Unsupported                                           
TPM logging failed: Unsupported                                                 
Could not create MokListXRT: Unsupported                                        
TPM logging failed: Unsupported                                                 
Could not create MokListTrustedRT: Unsupported                                  
Something has gone seriously wrong: import_mok_state() failed: Unsupported      
TPM logging failed: Unsupported                                                 
EFI stub: Booting Linux Kernel...                                               
EFI stub: EFI_RNG_PROTOCOL unavailable                                          
EFI stub: Using DTB from configuration table                                    
EFI stub: Exiting boot services...                                              
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x413fd0c1]          
[    0.000000] Linux version 5.15.0-89-generic (buildd@bos02-arm64-007) (gcc (Ub
untu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #99-U
buntu SMP Mon Oct 30 23:43:36 UTC 2023 (Ubuntu 5.15.0-89.99-generic 5.15.126)

I also tried Ubuntu 22.04 image 20231201, sha256sum: ea069246bbd12557ee13cd17f4f8be55a3885a7186c98a3afc677babde136d1f with exactly same command line arguments, but still cannot reproduce it.

peng6662001 commented 9 months ago

Failed to start cloud-hypervisor when using hypervisor_fw on aarch64

thomasbarrett commented 3 months ago

I seem to be running into this as well using Ubuntu 22.04 cloud-image + cloud-hypervisor main @retrage on a r8g.metal-24xl instance. I only ran into this issue while starting virtual machines with 128GiB+ of RAM. Any idea what could be causing this?

retrage commented 3 months ago

I have no idea why we cannot reproduce the issue. We can close this issue when your PR #346 is merged.

acarp-crusoe commented 4 days ago

I ran into this while attempting to create VMs with a DRAM region > 126G. I took some time to investigate and found that the issue is around a fixed limit of 128G on the page table for aarch64. In the aarch64 memory layout the page table is hardcoded to a max address space of 128G, so going beyond that exhausts the page table limit.

pub mod map {
    // Create page table for 128G is enough
    pub const END: usize = 0x20_0000_0000;

Including the space reserved for the kernel as well as MMIO the 128G allocated leaves about 126G for total System Memory. I've put out a PR to bump this limit to support the systems that we're developing with (2TB).

https://github.com/cloud-hypervisor/rust-hypervisor-firmware/pull/355