firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
24.52k stars 1.72k forks source link

[Bug] Only up to 64 devices can be used on aarch64 #4207

Open kalyazin opened 8 months ago

kalyazin commented 8 months ago

Describe the bug

It is expected that up to 96 devices can be used on aarch64, however If more than 64 devices are attached to an aarch64 microVM, only first 64 are usable.

To Reproduce

  1. Use an aarch64 machine
  2. Checkout the repro branch
  3. Build Firecracker: ./tools/devtool build
  4. Run the test_attach_maximum_devices test:
    ./tools/devtool -y test  -- -vv integration_tests/functional/test_max_devices.py::test_attach_maximum_devices

The observed failure is:

integration_tests/functional/test_max_devices.py:36: in test_attach_maximum_devices
    exit_code, _, _ = test_microvm.ssh_iface(i).run("sync")
        _          = ''
        exit_code  = 0
        i          = 63
        test_microvm = <Microvm id=81d12760-4116-4407-9f49-2b3967fbf98a>
        test_microvm_with_api = <Microvm id=81d12760-4116-4407-9f49-2b3967fbf98a>
...
host_tools/network.py:93: in _init_connection
    raise ConnectionError
E   ConnectionError
        _          = ''
        ecode      = 255
        self       = <host_tools.network.SSHConnection object at 0xffff943f6200>

The test creates a rootfs block device and a number of net devices. When it tries to connect the the last one (which is a 65th device in total), it fails.

Expected behaviour

The test should have passed, because according to the aarch64 layout, it should be possible to use up to 96 devices on aarch64.

Environment

Additional context

Impact: users cannot use more than 64 devices attached to an aarch64 microVM.

Checks

dush-t commented 6 months ago

Hi, is this issue still up? Can attempt a fix.

kalyazin commented 6 months ago

Hi @dush-t ! Yes, the issue is still valid. Please feel free to take it. Thanks in advance!

dush-t commented 6 months ago

Perfect. Wanted to know if this can be tested for on an ARM MacBook, or do I need to get myself a Linux device?

bchalios commented 6 months ago

Hi @dush-t, you need Linux KVM to test this.

Ecazares15 commented 2 months ago

Hello!

We are students from the University of Texas at Austin taking a virtualization course (cs360v) looking for opportunities to contribute to an open source project for class credit.

Could I be assigned to this?

vliaskov commented 2 months ago

This is still reproducible.

It is expected that up to 96 devices can be used on aarch64, however If more than 64 devices are attached to an aarch64 microVM, only first 64 are usable.

@kalyazin where is the 96 devices limit defined?

Can this limit be configured by a GIC configuration option in the guest?

Currently, I only see 64 virtio IRQs in the test vm spawned by the test:

cat /proc/interrupts | grep -i virtio | wc

64

cat /proc/interrupts | grep -i virtio

           CPU0       CPU1       
 14:       1259          0     GIC-0  64 Edge      virtio0
 15:        129          0     GIC-0  65 Edge      virtio1
 16:          1          0     GIC-0  66 Edge      virtio2
 17:          1          0     GIC-0  67 Edge      virtio3
 18:          1          0     GIC-0  68 Edge      virtio4
 19:          0          0     GIC-0  69 Edge      virtio5
 20:          0          0     GIC-0  70 Edge      virtio6
 21:          0          0     GIC-0  71 Edge      virtio7
 22:          0          0     GIC-0  72 Edge      virtio8
 23:          0          0     GIC-0  73 Edge      virtio9
 24:          0          0     GIC-0  74 Edge      virtio10
 25:          1          0     GIC-0  75 Edge      virtio11
 26:          1          0     GIC-0  76 Edge      virtio12
 27:          1          0     GIC-0  77 Edge      virtio13
 28:          1          0     GIC-0  78 Edge      virtio14
 29:          1          0     GIC-0  79 Edge      virtio15
 30:          1          0     GIC-0  80 Edge      virtio16
 31:          1          0     GIC-0  81 Edge      virtio17
 32:          1          0     GIC-0  82 Edge      virtio18
 33:          1          0     GIC-0  83 Edge      virtio19
 34:          1          0     GIC-0  84 Edge      virtio20
 35:          1          0     GIC-0  85 Edge      virtio21
 36:          1          0     GIC-0  86 Edge      virtio22
 37:          1          0     GIC-0  87 Edge      virtio23
 38:          1          0     GIC-0  88 Edge      virtio24
 39:          1          0     GIC-0  89 Edge      virtio25
 40:          1          0     GIC-0  90 Edge      virtio26
 41:          1          0     GIC-0  91 Edge      virtio27
 42:          1          0     GIC-0  92 Edge      virtio28
 43:          1          0     GIC-0  93 Edge      virtio29
 44:          1          0     GIC-0  94 Edge      virtio30
 45:          1          0     GIC-0  95 Edge      virtio31
 46:          1          0     GIC-0  96 Edge      virtio32
 47:          1          0     GIC-0  97 Edge      virtio33
 48:          1          0     GIC-0  98 Edge      virtio34
 49:          1          0     GIC-0  99 Edge      virtio35
 50:          1          0     GIC-0 100 Edge      virtio36
 51:          1          0     GIC-0 101 Edge      virtio37
 52:          1          0     GIC-0 102 Edge      virtio38
 53:          0          0     GIC-0 103 Edge      virtio39
 54:          0          0     GIC-0 104 Edge      virtio40
 55:          0          0     GIC-0 105 Edge      virtio41
 56:          0          0     GIC-0 106 Edge      virtio42
 57:          0          0     GIC-0 107 Edge      virtio43
 58:          0          0     GIC-0 108 Edge      virtio44
 59:          0          0     GIC-0 109 Edge      virtio45
 60:          0          0     GIC-0 110 Edge      virtio46
 61:          0          0     GIC-0 111 Edge      virtio47
 62:          0          0     GIC-0 112 Edge      virtio48
 63:          0          0     GIC-0 113 Edge      virtio49
 64:          0          0     GIC-0 114 Edge      virtio50
 65:          0          0     GIC-0 115 Edge      virtio51
 66:          0          0     GIC-0 116 Edge      virtio52
 67:          0          0     GIC-0 117 Edge      virtio53
 68:          0          0     GIC-0 118 Edge      virtio54
 69:          0          0     GIC-0 119 Edge      virtio55
 70:          0          0     GIC-0 120 Edge      virtio56
 71:          0          0     GIC-0 121 Edge      virtio57
 72:          0          0     GIC-0 122 Edge      virtio58
 73:          0          0     GIC-0 123 Edge      virtio59
 74:          0          0     GIC-0 124 Edge      virtio60
 75:          0          0     GIC-0 125 Edge      virtio61
 76:          0          0     GIC-0 126 Edge      virtio62
 77:          0          0     GIC-0 127 Edge      virtio63
kalyazin commented 2 months ago

Hi @vliaskov . I believe 96 is inferred from https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/device_manager/resources.rs#L31

gsi_allocator: IdAllocator::new(arch::IRQ_BASE, arch::IRQ_MAX)?,

where

// As per virt/kvm/arm/vgic/vgic-kvm-device.c we need
// the number of interrupts our GIC will support to be:
// * bigger than 32
// * less than 1023 and
// * a multiple of 32.
/// The highest usable SPI on aarch64.
pub const IRQ_MAX: u32 = 128;

/// First usable interrupt on aarch64.
pub const IRQ_BASE: u32 = 32;

This may well be misaligned with what the guest configures. Ideally, if we can only have up to 64 functional devices, we should be failing closely if more devices are requested via API/config to avoid hard-to-debug failures users may observe.

vliaskov commented 1 month ago

Thanks for the clarification. For anyone following, the logic is described in src/vmm/src/arch/aarch64/gic/gicv3/mod.rs:

    /// Finalize the setup of a GIC device
    pub fn finalize_device(gic_device: &Self) -> Result<(), GicError> {
        // On arm there are 3 types of interrupts: SGI (0-15), PPI (16-31), SPI (32-1020).
        // SPIs are used to signal interrupts from various peripherals accessible across
        // the whole system so these are the ones that we increment when adding a new virtio device.
        // KVM_DEV_ARM_VGIC_GRP_NR_IRQS sets the highest SPI number. Consequently, we will have a
        // total of `super::layout::IRQ_MAX - 32` usable SPIs in our microVM.
        let nr_irqs: u32 = super::layout::IRQ_MAX;
        let nr_irqs_ptr = &nr_irqs as *const u32;
        Self::set_device_attribute(
            gic_device.device_fd(),
            kvm_bindings::KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
            0,
            nr_irqs_ptr as u64,
            0,
        )?;

However, the guest dmesg contains:

[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0

so there must be something changing the maximum number of SPIs during guest initialization. I haven't figured out what yet.

The logic in linux guest kernel arch/arm64/kvm/vgic/vgic-kvm-device.c seems to match the one in firecracker:

[...]           
    case KVM_DEV_ARM_VGIC_GRP_NR_IRQS: {
        u32 __user *uaddr = (u32 __user *)(long)attr->addr;
        u32 val;
        int ret = 0;

        if (get_user(val, uaddr))
            return -EFAULT;

        /*
         * We require:
         * - at least 32 SPIs on top of the 16 SGIs and 16 PPIs
         * - at most 1024 interrupts
         * - a multiple of 32 interrupts
         */
        if (val < (VGIC_NR_PRIVATE_IRQS + 32) ||
            val > VGIC_MAX_RESERVED ||
            (val & 31))
            return -EINVAL;

        mutex_lock(&dev->kvm->arch.config_lock);

        if (vgic_ready(dev->kvm) || dev->kvm->arch.vgic.nr_spis)
            ret = -EBUSY;
        else
            dev->kvm->arch.vgic.nr_spis =
                val - VGIC_NR_PRIVATE_IRQS;

        mutex_unlock(&dev->kvm->arch.config_lock);

        return ret;

VGIC_NR_PRIVATE_IRQS evaluates to 32, so the number of SPIs should be what is expected in firecracker.

A part of VGIC initilization I don't understand (and I don't understand if it's relevant here) is :

12.9.38 GICD_TYPER, Interrupt Controller Type Register
The GICD_TYPER characteristics are:
[...]
•The maximum number of INTIDs that the GIC implementation supports.
ITLinesNumber, bits [4:0]
For the INTID range 32 to 1019, indicates the maximum SPI supported.
If the value of this field is N, the maximum SPI INTID is 32(N+1) minus 1. For example, 00011
specifies that the maximum SPI INTID is 127.

Do src/vmm/src/arch/aarch64/gic/gicv3/regs/icc_regs.rs and src/vmm/src/arch/aarch64/gic/gicv3/regs/redist_regs.rs seem to initialize these bits[4:0] to 0x1B ( gicr_typer = 123), which could result in 32*(27+1)-1 = 895 SPIs (sorry, I am a rust beginner, learning as I go)?

        let gicr_typer = 123; 
        let res = get_icc_regs(gic_fd.device_fd(), gicr_typer);
        let mut state = res.unwrap();
        assert_eq!(state.main_icc_regs.len(), 7);
        assert_eq!(state.ap_icc_regs.len(), 8);

        set_icc_regs(gic_fd.device_fd(), gicr_typer, &state).unwrap();

Anyway this analysis may be out of scope for fixing this test. Let me know if digging deeper is appropriate in the current bug or not. Currently, it seems 96 SPIs is an overestimate of the available SPIs on a Linux guest kernel.