firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.42k stars 1.77k forks source link

[Bug] Unable to boot with new(er) kernel #4816

Open wociscz opened 1 week ago

wociscz commented 1 week ago

Description

Can't boot the VM with new kernel other than firecracker's 4.14. I'm always getting:

[   12.489510] /dev/root: Can't open blockdev
[   12.489784] VFS: Cannot open root device "vda" or unknown-block(0,0): error -6
[   12.490205] Please append a correct "root=" boot option; here are the available partitions:
[   12.490717] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Tried firecracker's 5.10.223 and 6.1.102 and also built my own with provided .config from the repo all with the same error as pasted above. When using 4.14 kernel, VM boots without any problem (but it lack's nftables support, which is the reason I'm trying/building the new one)

Static json config and mainly the rootfs drive path options for the VM are the same for all kernel variants with respective changes of the kernel_image_path.

Rootfs is alpine.ext4 file made by the help of this doc.

Host os is Ubuntu with 6.9.5 kernel

To Reproduce

Expected behaviour

Boots with newer or own kernel without any problem.

Environment

Additional context

static json config for the VM:

{
  "boot-source": {
    "kernel_image_path": "path_to_vmlinux_kernel",
    "boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",
    "initrd_path": null
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "partuuid": null,
      "is_root_device": true,
      "cache_type": "Unsafe",
      "is_read_only": false,
      "path_on_host": "alpine.ext4",
      "io_engine": "Sync",
      "rate_limiter": null,
      "socket": null
    }
  ],
  "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 1024,
    "smt": false,
    "track_dirty_pages": false,
    "huge_pages": "None"
  },
  "cpu-config": null,
  "balloon": null,
  "network-interfaces": [],
  "vsock": null,
  "logger": null,
  "metrics": null,
  "mmds-config": null,
  "entropy": null
}

Checks

✅ Have you searched the Firecracker Issues database for similar problems? ✅ Have you read the existing relevant Firecracker documentation? ✅ Are you certain the bug being reported is a Firecracker issue?

Kevin-A commented 1 week ago

I have been having the same problems for weeks/months and have not been able to solve it. In my case I was running 5.10 fine for several months, until it stopped working on new hosts. I've tried Intel and AMD CPUs, built different kernel versions (5.10, 6.1, 6.9), used included and pre-built kernels, used different boot args (e.g. specifying root), built several root filesystems in different ways (ext4 as I did previously, using the included scripts, using Docker, building manually according to the guide), and played with permissions/uids.

I initially suspected it was due to me switching building the rootfs on the host system to building it in a Docker container, however I never got it working again.

Edit: I logged back onto the host that worked. It ran firecracker v1.3.3. Booting a VM with that version works. When I try to boot the same vmlinux with v1.8.0 it fails with the error mentioned in OP.

Linux version and command line args passed by default on firecracker v1.3.3

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: noapic reboot=k panic=1 pci=off nomodules ro console=ttyS0 root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6 virtio_mmio.device=4K@0xd0002000:7

Linux version and command line args passed by default on firecracker v1.8.0

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: panic=1 pci=off nomodules ro console=ttyS0 noapic reboot=k root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6

Edit 2: v1.3.3 works v1.6.0 works v1.7.0 works v1.8.0 fails v1.9.0 fails

wociscz commented 1 week ago

Ok, thanks for the hint with the older versions. It never came to my mind try older versions.

I can confirm that with the firecracker v1.7.0 my config works and microVM boot without any issue. Newer version fails. Only change is the firecracker binary in that case.

Edit: Finally after some tweaking (own 6.1 kernel compile) I am able to run docker inside firecracker which was my original intent. Only the problem with boot of firecracker v1.8.0 and v1.9.0 persist.

bchalios commented 1 week ago

Hello, and thanks for reporting this.

I suspect this has to do with us introducing ACPI support with Firecracker v1.8.0. For mainline kernels to work, we need to compile the kernel with both CONFIG_ACPI and CONFIG_PCI (https://github.com/firecracker-microvm/firecracker/blob/main/docs/kernel-policy.md#booting-with-acpi-x86_64-only).

If only CONFIG_ACPI is used then the kernel fails to parse ACPI tables and it doesn't load the virtio drivers and loading the rootfs, naturally, fails with the error you pasted in the issue description. For our CI, we use Amazon Linux kernels which include a fix that allows kernels built with CONFIG_ACPI only to boot.

We also trying to upstream the same fix: https://www.spinics.net/lists/linux-acpi/msg125662.html

The weird thing, though, is that you observe the behaviour with the kernels from our CI. Could you please:

  1. provide a full kernel log from a failed boot sequence?
  2. Try to build your kernel with both CONFIG_ACPI and CONFIG_PCI enabled and retry?

Disabling ACPI all together should also work, however, we are deprecating MPTable for booting, so I'd really like if we can make building with ACPI smoother :)

wociscz commented 1 week ago

Boot logs with 6.1.102 and 6.1.custom (own build with CONFIG_ACPI and CONFIG_PCI enabled). Firecracker's json config is the same as in original post.

firecracker_boot_6.1.102.txt firecracker_boot_6.1.custom.txt

bchalios commented 1 week ago

Could you drop the noapic kernel parameter from here:

"boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",

wociscz commented 1 week ago

Yep. That did the trick. Now I can boot with v.1.9 without problem.

wociscz commented 1 week ago

My working boot args are now "boot_args": "ro console=ttyS0 reboot=k panic=1" So it might be only the documentation/howto problem at all. Thanks for prompt solution.

bchalios commented 1 week ago

Yes, we should update the documentation to fix that. If you feel like, PRs are welcome. Otherwise, we'll open a PR once we find some free time :)

Thanks again for reporting.