firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.03k stars 1.75k forks source link

[Bug] Guest kernel panics running go init script when ACPI is enabled #4688

Closed maggie-lou closed 1 month ago

maggie-lou commented 1 month ago

Describe the bug

After upgrading to v1.8.0, the guest kernel panics when trying to execute a go init script. The panic occurs before the script itself begins executing (i.e. it's not something within the script causing the panic).

To Reproduce

I'm still working on reproducing the panic. In this test repo, when using the same guest kernel image and boot args, the init process hangs but does not panic. The test repo is able to run successfully if ACPI is disabled.

To run the test repo, you can clone https://github.com/maggie-lou/firecracker-repro/tree/init_go; checkout init_go; and run make test.

I've included the full guest kernel logs at the end, if that sparks some recognition and would appreciate any debugging tips.

Within our setup, the panic happens 100% of the time.

Expected behaviour

Before the init script starts executing, we see a panic in the guest kernel logs (Full kernel logs at the end). If we disable ACPI with acpi=off in the boot args, the script successfully runs.

[    0.820152] Run /init as init process
[    0.838110] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   13.847635] CPU: 0 PID: 244 Comm: init Not tainted 5.15.0 #1
[   13.850518] Call Trace:
[   13.851806]  show_stack+0x3d/0x3f
[   13.853533]  dump_stack_lvl+0x38/0x49
[   13.855434]  dump_stack+0x10/0x12
[   13.857140]  panic+0xe8/0x28c
[   13.858686]  do_exit.cold+0x15/0xa0
[   13.860475]  do_group_exit+0x36/0xa0
[   13.862303]  get_signal+0x149/0x7f0
[   13.864101]  arch_do_signal_or_restart+0xe6/0x110
[   13.866468]  ? do_futex+0x138/0x1d0
[   13.868266]  ? __x64_sys_futex+0x73/0x1d0
[   13.870300]  ? do_user_addr_fault+0x1c3/0x5b0
[   13.872550]  ? ksys_mmap_pgoff+0x53/0x250
[   13.874612]  exit_to_user_mode_prepare+0xb1/0x120
[   13.877025]  syscall_exit_to_user_mode+0x21/0x40
[   13.879375]  do_syscall_64+0x48/0x90
[   13.881218]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   13.883763] RIP: 0033:0x475f4

Environment

We are using this guest kernel config: https://github.com/firecracker-microvm/firecracker/blob/main/resources/guest_configs/microvm-kernel-ci-x86_64-5.10.config

Additional context

Our understanding is that disabling ACPI will be deprecated on x86_64, so we want to make sure we can still initialize VMs.

We are not very familiar with ACPI, and it's very likely the problem is due to misconfiguration on our end. We'd appreciate any guidance if that's the case.

Checks

We have followed these docs and ensured the guest kernel was configured with CONFIG_ACPI=y CONFIG_PCI=y CONFIG_X86_MPPARSE=n CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=n

We are using the guest kernel config provided in the firecracker repo and have tried both including and not including the following options which were mentioned in the docs, but not included in that sample config.

CONFIG_X86_MPPARSE=n
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=n

We narrowed down the source of the panic to ACPI by running a git bisect on the commits in the latest release, and verified that disabling ACPI with acpi=off fixes the problems.

Full guest kernel logs

[    0.000000] Linux version 5.15.0 (jdhollen@system76-pc) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP Mon Aug 14 14:47:04 PDT 2023
[    0.000000] Command line: nomodules=1 ipv6.disable=1 -set_default_route i8042.noaux lapic=notscdeadline ro noapic reboot=k i8042.nomux i8042.nopnp i8042.dumbkbd tsc=reliable -enable_rootfs console=ttyS0 panic=1 pci=off random.trust_cpu=on ip=192.168.241.2:::255.255.255.48::eth0:off virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6 virtio_mmio.device=4K@0xd0002000:7 virtio_mmio.device=4K@0xd0003000:8
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[    0.000000] x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
[    0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
[    0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[    0.000000] x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted' format.
[    0.000000] signal: max sigframe size: 3632
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x00000000000dffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cfffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000001687fffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 2c01001, primary cpu clock
[    0.000012] kvm-clock: using sched offset of 37592557 cycles
[    0.000048] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000117] tsc: Detected 3100.226 MHz processor
[    0.000733] last_pfn = 0x168800 max_arch_pfn = 0x400000000
[    0.000993] Disabled
[    0.001027] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.001123] CPU MTRRs all blank - virtualized system.
[    0.001153] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
[    0.001172] last_pfn = 0xd0000 max_arch_pfn = 0x400000000
[    0.001225] found SMP MP-table at [mem 0x0009fc00-0x0009fc0f]
[    0.003233] Using GB pages for direct mapping
[    0.004383] RAMDISK: [mem 0xce286000-0xcfffffff]
[    0.004576] ACPI: Early table checksum verification disabled
[    0.004815] ACPI: RSDP 0x00000000000E0000 000024 (v02 FIRECK)
[    0.004879] ACPI: XSDT 0x00000000000A022E 000034 (v01 FIRECK FCMVXSDT 00000000 FCAT 20240119)
[    0.004936] ACPI: FACP 0x00000000000A00BA 000114 (v06 FIRECK FCVMFADT 00000000 FCAT 20240119)
[    0.004971] ACPI: DSDT 0x000000000009FD80 00033A (v02 FIRECK FCVMDSDT 00000000 FCAT 20240119)
[    0.004989] ACPI: APIC 0x00000000000A01CE 000060 (v06 FIRECK FCVMMADT 00000000 FCAT 20240119)
[    0.004992] ACPI: Reserving FACP table memory at [mem 0xa00ba-0xa01cd]
[    0.004994] ACPI: Reserving DSDT table memory at [mem 0x9fd80-0xa00b9]
[    0.004995] ACPI: Reserving APIC table memory at [mem 0xa01ce-0xa022d]
[    0.005656] No NUMA configuration found
[    0.005669] Faking a node at [mem 0x0000000000000000-0x00000001687fffff]
[    0.005711] NODE_DATA(0) allocated [mem 0x1687dd000-0x1687fefff]
[    0.006512] Zone ranges:
[    0.006535]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.006548]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.006561]   Normal   [mem 0x0000000100000000-0x00000001687fffff]
[    0.006562] Movable zone start for each node
[    0.006564] Early memory node ranges
[    0.006565]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.006577]   node   0: [mem 0x0000000000100000-0x00000000cfffffff]
[    0.006579]   node   0: [mem 0x0000000100000000-0x00000001687fffff]
[    0.006591] Initmem setup node 0 [mem 0x0000000000001000-0x00000001687fffff]
[    0.006679] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.007545] On node 0, zone DMA: 97 pages in unavailable ranges
[    0.274058] On node 0, zone Normal: 30720 pages in unavailable ranges
[    0.274192] ACPI: Skipping IOAPIC probe due to 'noapic' option.
[    0.274194] ACPI: Using ACPI for processor (LAPIC) configuration information
[    0.274240] Intel MultiProcessor Specification v1.4
[    0.274265] MPTABLE: OEM ID: FC      
[    0.274266] MPTABLE: Product ID: 000000000000
[    0.274267] MPTABLE: APIC at: 0xFEE00000
[    0.274428] IOAPIC[0]: apic_id 6, version 17, address 0xfec00000, GSI 0-23
[    0.274442] Processors: 5
[    0.274488] smpboot: Allowing 5 CPUs, 0 hotplug CPUs
[    0.274816] kvm-guest: KVM setup pv remote TLB flush
[    0.274848] kvm-guest: setup PV sched yield
[    0.275023] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.275036] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.275037] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.275038] PM: hibernation: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.275039] PM: hibernation: Registered nosave memory: [mem 0xd0000000-0xffffffff]
[    0.275041] [mem 0xd0000000-0xffffffff] available for PCI devices
[    0.275053] Booting paravirtualized kernel on KVM
[    0.275130] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.275189] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:5 nr_node_ids:1
[    0.295184] percpu: Embedded 41 pages/cpu s131072 r8192 d28672 u1048576
[    0.295345] kvm-guest: setup async PF for cpu 0
[    0.295368] kvm-guest: stealtime: cpu 0, msr 16301f080
[    0.295395] kvm-guest: PV spinlocks enabled
[    0.295397] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.295499] Built 1 zonelists, mobility grouping on.  Total pages: 1259744
[    0.295501] Policy zone: Normal
[    0.295514] Kernel command line: nomodules=1 ipv6.disable=1 -set_default_route i8042.noaux lapic=notscdeadline ro noapic reboot=k i8042.nomux i8042.nopnp i8042.dumbkbd tsc=reliable -enable_rootfs console=ttyS0 panic=1 pci=off random.trust_cpu=on ip=192.168.241.2:::255.255.255.48::eth0:off virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6 virtio_mmio.device=4K@0xd0002000:7 virtio_mmio.device=4K@0xd0003000:8
[    0.296341] Unknown command line parameters: -set_default_route -enable_rootfs nomodules=1
[    0.322633] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.335798] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.336089] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.563039] Memory: 4896268K/5119608K available (10242K kernel code, 7833K rwdata, 2008K rodata, 1524K init, 6916K bss, 223084K reserved, 0K cma-reserved)
[    0.563170] random: get_random_u64 called from kmem_cache_open+0x20/0x300 with crng_init=0
[    0.563508] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
[    0.564406] rcu: Hierarchical RCU implementation.
[    0.564418] rcu:     RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=5.
[    0.564431]  Tracing variant of Tasks RCU enabled.
[    0.564454] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.564456] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=5
[    0.564506] NR_IRQS: 4352, nr_irqs: 64, preallocated irqs: 0
[    0.564943] random: crng done (trusting CPU's manufacturer)
[    0.565309] Console: colour dummy device 80x25
[    0.917691] printk: console [ttyS0] enabled
[    0.919905] ACPI: Core revision 20210730
[    0.922126] ACPI: setting ELCR to 0001 (from 0000)
[    0.924558] APIC: Switch to symmetric I/O mode setup
[    0.927048] Not enabling interrupt remapping due to skipped IO-APIC setup
[    0.930391] kvm-guest: setup PV IPIs
[    0.932825] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2cb01ce5402, max_idle_ns: 440795236755 ns
[    0.937996] Calibrating delay loop (skipped) preset value.. 6200.45 BogoMIPS (lpj=12400904)
[    0.941991] pid_max: default: 32768 minimum: 301
[    0.941991] LSM: Security Framework initializing
[    0.941991] SELinux:  Initializing.
[    0.941991] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.941991] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.941991] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.941991] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.941991] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.941991] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.941991] Spectre V2 : Mitigation: Enhanced IBRS
[    0.941991] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.941991] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.941991] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    0.941991] TAA: Mitigation: Clear CPU buffers
[    0.941991] MDS: Mitigation: Clear CPU buffers
[    0.941991] Freeing SMP alternatives memory: 32K
[    0.941991] smpboot: CPU0: Intel(R) Xeon(R) Processor @ 3.10GHz (family: 0x6, model: 0x55, stepping: 0x7)
[    0.943219] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
[    0.946434] rcu: Hierarchical SRCU implementation.
[    0.952546] smp: Bringing up secondary CPUs ...
[    0.954536] x86: Booting SMP configuration:
[    0.957994] .... node  #0, CPUs:      #1
[    0.377185] kvm-clock: cpu 1, msr 2c01041, secondary cpu clock
[    0.963694] kvm-guest: setup async PF for cpu 1
[    0.965991] kvm-guest: stealtime: cpu 1, msr 16311f080
[    0.970575]  #2
[    0.377185] kvm-clock: cpu 2, msr 2c01081, secondary cpu clock
[    0.976545] kvm-guest: setup async PF for cpu 2
[    0.977991] kvm-guest: stealtime: cpu 2, msr 16321f080
[    0.982792]  #3
[    0.377185] kvm-clock: cpu 3, msr 2c010c1, secondary cpu clock
[    0.988455] kvm-guest: setup async PF for cpu 3
[    0.989991] kvm-guest: stealtime: cpu 3, msr 16331f080
[    0.994561]  #4
[    0.377185] kvm-clock: cpu 4, msr 2c01101, secondary cpu clock
[    1.000057] kvm-guest: setup async PF for cpu 4
[    1.001991] kvm-guest: stealtime: cpu 4, msr 16341f080
[    1.006160] smp: Brought up 1 node, 5 CPUs
[    1.008152] smpboot: Max logical packages: 1
[    1.009994] smpboot: Total of 5 processors activated (31002.26 BogoMIPS)
[    1.070494] devtmpfs: initialized
[    1.071799] x86/mm: Memory block size: 128MB
[    1.075278] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    1.078010] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
[    1.086243] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    1.089268] DMA: preallocated 1024 KiB GFP_KERNEL pool for atomic allocations
[    1.090061] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    1.094054] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    1.098084] audit: initializing netlink subsys (disabled)
[    1.102075] audit: type=2000 audit(1721145130.185:1): state=initialized audit_enabled=0 res=1
[    1.102295] thermal_sys: Registered thermal governor 'fair_share'
[    1.105993] thermal_sys: Registered thermal governor 'step_wise'
[    1.109992] thermal_sys: Registered thermal governor 'user_space'
[    1.112839] cpuidle: using governor ladder
[    1.115953] cpuidle: using governor menu
[    1.119849] Kprobes globally optimized
[    1.122103] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    1.125993] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    1.138340] ACPI: Added _OSI(Module Device)
[    1.138340] ACPI: Added _OSI(Processor Device)
[    1.140123] ACPI: Added _OSI(3.0 _SCP Extensions)
[    1.141992] ACPI: Added _OSI(Processor Aggregator Device)
[    1.145992] ACPI: Added _OSI(Linux-Dell-Video)
[    1.148078] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    1.149992] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[    1.152512] ACPI Error: AE_BAD_PARAMETER, During Region initialization (20210730/tbxfload-52)
[    1.157992] ACPI: Unable to load the System Description Tables
[    1.157992] ACPI Error: Could not remove SCI handler (20210730/evmisc-251)
[    1.162592] SCSI subsystem initialized
[    1.164430] pps_core: LinuxPPS API ver. 1 registered
[    1.165993] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    1.173995] PTP clock support registered
[    1.174488] NetLabel: Initializing
[    1.177992] NetLabel:  domain hash size = 128
[    1.177992] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    1.180704] NetLabel:  unlabeled traffic allowed by default
[    1.186034] clocksource: Switched to clocksource kvm-clock
[    1.189581] VFS: Disk quotas dquot_6.6.0
[    1.191651] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.195166] pnp: PnP ACPI: disabled
[    1.204879] NET: Registered PF_INET protocol family
[    1.210749] IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    1.215447] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    1.219733] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    1.225246] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear)
[    1.232243] TCP: Hash tables configured (established 65536 bind 65536)
[    1.235886] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear)
[    1.239554] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear)
[    1.243573] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    1.246381] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.246934] Unpacking initramfs...
[    1.249418] software IO TLB: mapped [mem 0x00000000ca286000-0x00000000ce286000] (64MB)
[    1.255206] virtio-mmio: Registering device virtio-mmio.0 at 0xd0000000-0xd0000fff, IRQ 5.
[    1.259197] virtio-mmio: Registering device virtio-mmio.1 at 0xd0001000-0xd0001fff, IRQ 6.
[    1.263198] virtio-mmio: Registering device virtio-mmio.2 at 0xd0002000-0xd0002fff, IRQ 7.
[    1.267279] virtio-mmio: Registering device virtio-mmio.3 at 0xd0003000-0xd0003fff, IRQ 8.
[    1.271458] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2cb01ce5402, max_idle_ns: 440795236755 ns
[    1.276327] clocksource: Switched to clocksource tsc
[    1.279260] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    1.286228] Initialise system trusted keyrings
[    1.288449] Key type blacklist registered
[    1.291452] workingset: timestamp_bits=36 max_order=21 bucket_order=0
[    1.295964] SLUB: Unable to add boot slab alias Acpi-ParseExt to sysfs
[    1.299779] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    1.302614] fuse: init (API version 7.34)
[    1.312682] Key type asymmetric registered
[    1.314719] Asymmetric key parser 'x509' registered
[    1.317217] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    1.322871] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[    1.326423] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    1.350318] loop: module loaded
[    1.363896] virtio_blk: probe of virtio0 failed with error -22
[    1.367063] virtio_blk: probe of virtio1 failed with error -22
[    1.369875] Loading iSCSI transport class v2.0-870.
[    1.372940] iscsi: registered transport (tcp)
[    1.375260] tun: Universal TUN/TAP device driver, 1.6
[    1.378600] virtio_net: probe of virtio2 failed with error -22
[    1.381466] i8042: PNP detection disabled
[    1.384103] i8042: probe of i8042 failed with error -22
[    1.386925] intel_pstate: CPU model not supported
[    1.389328] hid: raw HID events driver (C) Jiri Kosina
[    1.406061] Initializing XFRM netlink socket
[    1.408195] IPv6: Loaded, but administratively disabled, reboot required to enable
[    1.411941] NET: Registered PF_PACKET protocol family
[    1.414438] Bridge firewalling registered
[    1.416678] NET: Registered PF_VSOCK protocol family
[    1.419451] vmw_vsock_virtio_transport: probe of virtio3 failed with error -22
[    1.422965] IPI shorthand broadcast: enabled
[    1.425048] sched_clock: Marking stable (1049705274, 373185538)->(1634061896, -211171084)
[    1.429253] registered taskstats version 1
[    1.431267] Loading compiled-in X.509 certificates
[    1.465839] Freeing initrd memory: 30184K
[    1.470610] Loaded X.509 cert 'Build time autogenerated kernel key: 068e63e2c8200c1863e7bced9336b12f5127df1b'
[    1.475698] Key type ._fscrypt registered
[    1.477643] Key type .fscrypt registered
[    1.479577] Key type fscrypt-provisioning registered
[    1.482496] Key type encrypted registered
[   13.773006] Freeing unused decrypted memory: 2036K
[   13.778453] Freeing unused kernel image (initmem) memory: 1524K
[   13.781306] Write protecting the kernel read-only data: 14336k
[   13.790478] Freeing unused kernel image (text/rodata gap) memory: 2044K
[   13.793965] Freeing unused kernel image (rodata/data gap) memory: 40K
[   13.797069] Run /init as init process
[   13.843758] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   13.847635] CPU: 0 PID: 244 Comm: init Not tainted 5.15.0 #1
[   13.850518] Call Trace:
[   13.851806]  show_stack+0x3d/0x3f
[   13.853533]  dump_stack_lvl+0x38/0x49
[   13.855434]  dump_stack+0x10/0x12
[   13.857140]  panic+0xe8/0x28c
[   13.858686]  do_exit.cold+0x15/0xa0
[   13.860475]  do_group_exit+0x36/0xa0
[   13.862303]  get_signal+0x149/0x7f0
[   13.864101]  arch_do_signal_or_restart+0xe6/0x110
[   13.866468]  ? do_futex+0x138/0x1d0
[   13.868266]  ? __x64_sys_futex+0x73/0x1d0
[   13.870300]  ? do_user_addr_fault+0x1c3/0x5b0
[   13.872550]  ? ksys_mmap_pgoff+0x53/0x250
[   13.874612]  exit_to_user_mode_prepare+0xb1/0x120
[   13.877025]  syscall_exit_to_user_mode+0x21/0x40
[   13.879375]  do_syscall_64+0x48/0x90
[   13.881218]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   13.883763] RIP: 0033:0x475f43
[   13.885342] Code: 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14 4c 8b 54 24 18 4c 8b 44 24 20 44 8b 4c 24 28 b8 ca 00 00 00 0f 05 <89> 44 24 30 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[   13.894645] RSP: 002b:000000c0002fbcf8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[   13.898434] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 0000000000475f43
[   13.901993] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c000219148
[   13.905577] RBP: 000000c0002fbd40 R08: 0000000000000000 R09: 0000000000000000
[   13.909153] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000004
[   13.912743] R13: 0000000000000001 R14: 000000c0002b28c0 R15: 0000000000000003
[   13.916708] Kernel Offset: disabled
[   13.918521] Rebooting in 1 seconds..
2024-07-16T15:52:24.580965044 [4f9e9109-6e29-4f0e-b2cf-3f7295efa45e:main] Vmm is stopping.
2024-07-16T15:52:24.581556425 [4f9e9109-6e29-4f0e-b2cf-3f7295efa45e:main] Vmm is stopping.
2024-07-16T15:52:24.666078724 [4f9e9109-6e29-4f0e-b2cf-3f7295efa45e:main] Firecracker exiting successfully. exit_code=0
INFO[0061] firecracker exited: status=0                 
DEBU[0061] closing the exitCh <nil>  
bchalios commented 1 month ago

Hi @maggie-lou, I think I know what this is about. I assume your kernel .config does not include CONFIG_PCI=y, correct? If that is a case, please take a look here: https://github.com/firecracker-microvm/firecracker/blob/main/docs/kernel-policy.md#booting-with-acpi-x86_64-only

We are trying to work with the Linux community to patch the kernel, so you don't have to enable CONFIG_PCI: https://www.spinics.net/lists/linux-acpi/msg125662.html in cases where the hypervisor doesn't use PCI, but in the meantime you should enable it to get it to work.

Could you please try it out and let me know if it works?

maggie-lou commented 1 month ago

@bchalios unfortunately it does already include CONFIG_PCI=y. We are using the exact kernel config from the firecracker repo (https://github.com/firecracker-microvm/firecracker/blob/main/resources/guest_configs/microvm-kernel-ci-x86_64-5.10.config)

bchalios commented 1 month ago

That is weird. Could you provide a link to the source code where you build the kernel from? Btw, according to your logs, you seem to be using guest kernel 5.15 which we do not officially support. We have tested this configuration with 4.14, 5.10 and 6.1 without issues.

bchalios commented 1 month ago

Ok, I tried building 5.15 from Linus's tree doing:

git clone --depth=1 --branch v5.15 https://github.com/torvalds/linux

and building it with our config from here: https://github.com/firecracker-microvm/firecracker/blob/main/resources/guest_configs/microvm-kernel-ci-x86_64-5.10.config

I am able to boot a microVM without issues using the default command line parameters. However, using the command line parameters from your guest kernel logs reproduces the issue you are seeing. I'm going to look which parameter causes this.

bchalios commented 1 month ago

it looks as if the offending parameter is the noapic one. If I remove it from the command line the microVM boots correctly. Could you try and apply this and see if the problem goes away for you?

maggie-lou commented 1 month ago

Nice - that fixed it. Thank you!