firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.12k stars 1.76k forks source link

[Bug] Ongoing `read` Syscalls on Vsocks Don't Get Interrupted after the Second Snapshot Resume/Restore #4811

Closed pojntfx closed 2 days ago

pojntfx commented 3 days ago

Describe the bug

Recently, we reported a bug that lead to ongoing read syscalls not being interrupted after a snapshot resume. @ShadowCurse Then published a fix for this. We then verified that their fix worked by using our reproducer repo:

https://github.com/firecracker-microvm/firecracker/issues/4736#issuecomment-2347815017

In the reproducer repo, we're essentially doing this:

  1. Start Firecracker
  2. Create a VM with a VSock
  3. Start a VSock-over-UDS listener on the host with socat
  4. In the guest VM, connect to the listener on the host through VSock with socat
  5. Pause the VM and create a snapshot
  6. Stop the listener on the host
  7. Resume the VM

And this now all works fine now! At step 7, any in-progress reads from the VSock in the guest VM now return RST errors (as they should).

But if we add the following steps:

  1. Start a VSock-over-UDS listener on the host with socat again
  2. In the guest VM, connect to the listener on the host through VSock with socat
  3. Pause the VM and create a snapshot
  4. Stop the listener on the host
  5. Resume the VM

The issue appears again! Essentially, if we create & resume two or more subsequent snapshots (already verified this with upstream Firecracker using our existing reproducer repo), reads from the VSocks don't get any RST errors anymore but instead hang forever.

To Reproduce

See the original reproduction steps, but add the additional steps described above (snapshot & restore more than one time)

Expected behaviour

See the original expected behavior - from our reading, this behavior should continue to work even if it's the 2nd, 3rd etc. snapshot/restore cycle.

Potential Fix

We noticed that in src/vmm/src/device_manager/persist.rs, the VSock state is serialized before the reset event is sent. If this is swapped, the issue goes away, and the 2nd, 3rd etc. resume still correctly resets any connected VSocks. We're not sure if this is a proper solution though, if there are any better workarounds or if there is a better way to fix the issue. To test it, apply the following patch:

diff --git a/src/vmm/src/device_manager/persist.rs b/src/vmm/src/device_manager/persist.rs
index 7a51bf790..154427a84 100644
--- a/src/vmm/src/device_manager/persist.rs
+++ b/src/vmm/src/device_manager/persist.rs
@@ -365,11 +365,6 @@ impl<'a> Persist<'a> for MMIODeviceManager {
                         .downcast_mut::<Vsock<VsockUnixBackend>>()
                         .unwrap();

-                    let vsock_state = VsockState {
-                        backend: vsock.backend().save(),
-                        frontend: vsock.save(),
-                    };
-
                     // Send Transport event to reset connections if device
                     // is activated.
                     if vsock.is_activated() {
@@ -378,6 +373,11 @@ impl<'a> Persist<'a> for MMIODeviceManager {
                         });
                     }

+                    let vsock_state = VsockState {
+                        backend: vsock.backend().save(),
+                        frontend: vsock.save(),
+                    };
+
                     states.vsock_device = Some(ConnectedVsockState {
                         device_id: devid.clone(),
                         device_state: vsock_state,

Then run the reproduction steps using the reproducer repo. The issue should no longer appear afterwards.

Environment

Additional context

See the original additional context.

Checks

ShadowCurse commented 2 days ago

Hi again @pojntfx, thx for letting us know about this issue, I totally missed the serialization order in the previous PR. The fix you propose in the description is indeed valid and reasons why are in the new PR which is merged now.