firecracker-microvm / firecracker-go-sdk

An SDK in Go for the Firecracker microVM API
Apache License 2.0
466 stars 123 forks source link

Firecracker socket resets connection on shutdown #228

Open sriramsv opened 4 years ago

sriramsv commented 4 years ago

while trying out the machine.Shutdown() call from the go-sdk, the firecracker socket resets the connection and also shuts down the VM disgracefully due to the error.

I also used the new ForwardSignals set to []os.Signal{} to make sure firecracker does not listen to any interrupt signals coming from the host operating system.

Unable to send CtrlAltDel: Put "http://localhost/actions": read unix @->/run/firecracker.socket: read: connection reset by peer  component=microvm-0
DEBU[2020-04-21T20:20:32] instance Destroy()                            component="microvm-0 (867e7d58)" error="Put \"http://localhost/actions\": read unix @->/run/firecracker.socket: read: connection reset by peer"
INFO[2020-04-21T20:20:32] Finished shutdown of VMM                      component=vmm
WARN[2020-04-21T20:20:33] firecracker exited: signal: interrupt         component=microvm-0
DEBU[2020-04-21T20:20:33] closing the exitCh 1 error occurred:
    * signal: interrupt
  component=microvm-0

However when I try using the same kernel image with firectl, I am able to cleanly shutdown using the curl command to the firecracker socket

ps aux | grep firecracker
sriramv+ 12276  7.4  0.5 530096 40892 pts/0    Sl+  00:57   0:05 /usr/local/bin/firecracker --api-sock /home/sriramvenkat/.firecracker.sock-12268-81
sriramv+ 12366  0.0  0.0 112712   976 pts/1    S+   00:58   0:00 grep --color=auto firecracker
[sriramvenkat@containedwaned ~]$ curl --unix-socket  /home/sriramvenkat/.firecracker.sock-12268-81 -i     -X PUT "http://localhost/actions"     -H  "accept: application/json"     -H  "Content-Type: application/json"     -d "{
>              \"action_type\": \"SendCtrlAltDel\"
>     }"
HTTP/1.1 204
Server: Firecracker API
Connection: keep-alive
[sriramvenkat@containedwaned ~]$ ps aux | grep firecracker
sriramv+ 12276  5.3  0.5 530100 40892 pts/0    Sl+  00:57   0:05 /usr/local/bin/firecracker --api-sock /home/sriramvenkat/.firecracker.sock-12268-81
sriramv+ 12391  0.0  0.0 112712   976 pts/1    S+   00:59   0:00 grep --color=auto firecracker
kzys commented 4 years ago

What is the configuration of your Linux kernel? Shutdown internally uses SendCtrlDel, which is not supported by some Linux kernels. See https://github.com/firecracker-microvm/firecracker/issues/1095 for the detail.

alakesh commented 3 years ago

@sriramsv Do you still see this issue?

alexellis commented 2 months ago

I am experiencing the same behaviour as described here.

Running the above curl command against the socket causes a reboot and a clean shutdown:

[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Late Shutdown Services.
[  OK  ] Finished System Reboot.
[  OK  ] Reached target System Reboot.
[   39.516404] reboot: Restarting system
2024-04-22T15:08:31.644600842 [anonymous-instance:main] Vmm is stopping.
2024-04-22T15:08:31.644866100 [anonymous-instance:main] Vmm is stopping.
2024-04-22T15:08:31.775123014 [anonymous-instance:main] Firecracker exiting successfully. exit_code=0

When using the SDK and calling Shutdown(), I see:

ERRO[0013] Unable to send CtrlAltDel: Put "http://localhost/actions": read unix @->/var/run/sockets/firecracker-621912026/firecracker.sock: read: connection reset by peer 
2024/04/22 15:09:17 Error shutting down of-k3s-1: Put "http://localhost/actions": read unix @->/var/run/sockets/firecracker-621912026/firecracker.sock: read: connection reset by peer

In the guest, I see no new output on the serial console.

firecracker.Machine

// Shutdown requests a clean shutdown of the VM by sending CtrlAltDelete on the virtual keyboard
func (m *Machine) Shutdown(ctx context.Context) error {
    m.logger.Debug("Called machine.Shutdown()")
    return m.sendCtrlAltDel(ctx)
}

For some reason, there have been no SDK releases since Sept 2022, but I've tried using that and the latest commit in the main branch. I got the same results in either case.

Firecracker v1.7.0

alexellis commented 2 months ago

@swagatbora90 would you have any suggestions here? I saw you'd made commits into the SDK repository recently, feel free to ping someone else if needed.

alexellis commented 2 months ago

It seems like the bug is potentially a regression with signal passing, since when I invoke the same codepath through an internal HTTP endpoint, the shutdown method works as expected.

When I run control + c on the controller program, the signal appears to be propagated directly to Firecracker causing it to exit immediately meaning it's unable to handle the HTTP request.

I overrode the list of signals to make sure that nothing that would cause a termination would get passed through and can see that in the Go SDK:

DEBU[0001] Setting up signal handler: [user defined signal 1] 

The debug logs show that this signal propagation is faulty, despite only being configured to send through USR1, the control + c went to the Firecracker process and terminated it before the controller could clean it up.

DEBU[0061] Called machine.Shutdown()                    
WARN[0061] firecracker exited: signal: interrupt        
bduffany commented 1 month ago

When I run control + c on the controller program, the signal appears to be propagated directly to Firecracker causing it to exit immediately meaning it's unable to handle the HTTP request.

When you press Ctrl+C in the terminal, it sends the signal to all processes in the current process group rather than just a single process. I think the Go SDK would need to allow spawning firecracker/jailer processes with syscall.SysProcAttr{Setpgid: true} so that firecracker/jailer can be placed in their own process group, which would prevent them from getting signaled. Currently the SDK doesn't provide any options for customizing the exec.Cmd for firecracker/jailer, so there would need to be a patch to allow this

alexellis commented 1 month ago

That's a fair point, and what we eventually found out too. Thanks for commenting.