firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.12k stars 1.76k forks source link

[Bug] Ongoing `read` Syscalls on VSocks Don't Get Interrupted After a Snapshot Resume #4736

Closed pojntfx closed 1 week ago

pojntfx commented 1 month ago

Describe the bug

Firecracker’s VSock connection reset is not working as expected after resuming from a snapshot. Specifically, the ongoing read syscall in the guest VM does not get interrupted, and the socat instance continues running instead of exiting due to the VSock connection reset.

To Reproduce

  1. Start Firecracker
  2. Create a VM with a VSock
  3. Start a VSock-over-UDS listener on the host with socat
  4. In the guest VM, connect to the listener on the host through VSock with socat
  5. Pause the VM and create a snapshot
  6. Stop the listener on the host
  7. Resume the VM

The socat instance continues running and the ongoing read syscall does not get interrupted/the connection reset has no effect. New read and write syscalls however fail as expected (which can be caused by e.g. pressing Enter in socat), causing an EOF & thus causing socat to exit as expected.

For the full reproduction steps, please see loopholelabs/firecracker-vsock-snapshot-reset-bug-reproducer. This includes the helper scripts and assets (kernel, rootfs) to reproduce the bug.

Expected behaviour

The Firecracker VSock docs state:

Firecracker handles sending the reset event to the vsock driver, thus the customers are no longer responsible for closing active connections.

From our reading, this should mean that the socat instance running inside the guest VM, which has an ongoing read syscall, exits due to the VSock connection being reset & the read syscall being interrupted.

Environment

Additional context

How has this bug affected you/what are you trying to achieve: This bug affects the Drafter Agent System, which expects Firecracker to kill the active connection before it re-dials the host after a resume. Without the connection being killed by Firecracker, this does not work.

Do you have any idea of what the solution might be: Not a solution, but a workaround we've been trying to use is manually stopping the connection ourselves before snapshotting, but this causes a race condition on resume because the dial loop in the guest will sometimes be killed by the Firecracker reseting the (new) connections after a resume. We've also been investigating a kernel return probe kprobe/virtio_vsock_reset_sock to try and hold off with re-dialing after a resume until Firecracker has reset the connections if we're closing them ourselves before suspending, but preferably we would simply re-dial after Firecracker resets the active connection/interrupts the read syscall.

Checks

roypat commented 2 weeks ago

Hi @pojntfx, Thank you for the report and the detailed reproduction steps (and sorry for taking a while to start looking into this). I can indeed reproduce the behavior you observe locally, although I don't really have an idea yet as to why this is happening. Funnily enough, it does not happen with connections established the "other way around" (e.g. if I have a LISTEN socket inside of the guest and connection from the host. These kind of connections do get terminated as expected). I could verify that Firecracker definitely sends a VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event to the guest, and that the guest kernel indeed receives this (I put some printks into the in-kernel handler). However, for some reason with guest->host connections, when it tries to iterate over all connections to terminate them, it doesn't find any connections. I still have to figure out why this is the case. Best, Patrick

ShadowCurse commented 1 week ago

Hi @pojntfx, we have fixed the issue with the vsock you experienced. The problem was in Firecracker sending interrupt to the guest (with VIRTIO_VSOCK_EVENT_TRANSPORT_RESET placed in the queue) only during snapshot creation when VM is paused. This meant that when VM is restored, the queue in the guest does contain the VIRTIO_VSOCK_EVENT_TRANSPORT_RESET event, but VM has never received an interrupt with the notification about it. The solution was simply to send an interrupt on VM resume call, so that the guest can process the termination event. This fix will be included in 1.10 release of Firecracker.

pojntfx commented 1 week ago

Hi @ShadowCurse! Thank you - I'll verify this ASAP!

pojntfx commented 1 week ago

I was just able to to confirm that this now works perfectly. For the reproducer above, after a resume, we now get a reset as expected:

2024/09/13 00:58:19 socat[360] E read(5, 0x7fa8219db000, 8192): Socket not connected
2024/09/13 00:58:19 socat[360] N exit(1)
2024/09/13 00:58:19 socat[360] I shutdown(5, 2)

Thanks for taking a look at this so quickly!