au-ts / libvmm

An experimental virtual machine monitor for the seL4 microkernel
Other
29 stars 15 forks source link

ARM timer IRQs dropping in certain cases #138

Open Ivan-Velickovic opened 5 days ago

Ivan-Velickovic commented 5 days ago

In the past when the system has been under high load and the guest/VMM hasn't had a chance to run in a while errors like the following can show up:

VMM|ERROR: vIRQ 0x1b is not enabled 
VMM|ERROR: VPPI IRQ 27 dropped on vCPU 0 

I can't tell if this is a bug with the VMM or with the kernel since a similar case where the guest hasn't had a chance to run in a while is setting a budget less than the period.

However, this leads to a kernel assertion right now: https://github.com/seL4/seL4/issues/1104.

More investigation required, will post updates to this issue.

Furao commented 4 days ago

I have seen this issue consistently with certain schedules when trying to use the domain scheduler with microkit and libvmm. I built a microkit with these branches: https://github.com/JE-Archer/seL4/tree/microkit and https://github.com/JE-Archer/microkit/tree/domains.

Then I created a domain schedule for the libvmm simple example for qemu_virt_aarch64: https://github.com/Furao/libvmm/tree/domain

This consistently causes the issue posted here after the VM gets stuck waiting on timer IRQs for a couple of minutes with no output and then it starts printing the errors you posted. I think the main problem is that the VM's VCPU TCB was not getting set to any domains. I created this patch to set the VMs VCPU TCB to the same domain as the PD that contains it: https://github.com/JE-Archer/microkit/pull/1

This seems to fix the issue. I tested on QEMU and the ZCU102 with the same schedule.