firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.03k stars 1.75k forks source link

External Shutdown ("InstanceHalt") #673

Closed estone-aws closed 5 years ago

estone-aws commented 5 years ago

To gracefully shutdown a guest, I understand the current supported mechanism is to issue 'reboot' inside the guest (possibly remotely through ssh). Are there plans to allow the firecracker VMM to send a graceful shutdown signal into the guest, possibly in reaction to the (unimplemented but declared in the swagger) InstanceHalt action? Also if the guest hangs and I'm ok with a non-graceful shutdown, what is the recommended way to terminate it? I don't see explicit SIGHUP/QUIT/TERM handlers in the firecracker VMM - are any of these signals preferred for releasing host resources? If so I suggest adding a 'ForceInstanceHalt' action for triggering it to the API for congruence. Apologies if I missed some docs or handler code.

andreeaflorescu commented 5 years ago

This is an interesting topic. The problem that we currently have is that the API calls are synchronous and we would like to keep it that way so it is easier to have a clean one API call to configure and start the microVM (https://github.com/firecracker-microvm/firecracker/issues/343). With that in mind, we didn't find a good solution where the microvm could be properly shut down and at the same time the API request gets a response for the action. I guess we can pull some magic tricks to make this properly but it needs some design changes I guess if we go on the path of InstanceHalt.

As you said, we currently only support reboot for clean shutdown.

Do you have a proposal on this one?

raduweiss commented 5 years ago

Another thing to mention here is that we've basically worked with a view of the guest workload being to some extent stateless. For example, simply pkill-ing the microVM process is one of the expected ways to "stop" a microVM.

estone-aws commented 5 years ago

I'm unfamiliar with kvm or I'd just submit PR for this - but can SIGTERM-ing the firecracker process leave any state in kvm datastructures or does it all get cleaned up? If that's an expected and safe mechanism for a forced guest-termination let's document it as such. Adding an API to trigger the same SIGTERM handler as well would allow the entire lifecycle to be driven through the http tooling.

Toward graceful shutdown (frankly a lesser desire of mine than just knowing how to best force-terminate, and also much more complicated!):

While I understand and appreciate the strive for simplicity, VM's are stateful and I feel the control API should model that. The VM's have memory and block/net-devices so will buffer things and need an opportunity (however brief) to flush before exiting. Modeling lifecycle state in the API both for introspection, and to support inherently async actions like starting and shutting-down seems like appropriate complexity to me. Single-call 'macro' APIs can still be provided. But I don't know all the tradeoffs as I haven't studied the API code yet.

For passing a graceful shutdown signal from the VMM to the guest, it's something a guest would have to opt-in to and have explicit handling to support. Standard ACPI is too much. Random ideas - signal through a serial device or other IO Port? PS/2 keyboard? If not wanting to attach a new device for this, Firecracker could host a server on a link-local address that guests could long-poll for a shutdown message?

dhrgit commented 5 years ago

We could use an externally-triggered off switch, but I don't see any way around ACPI, and that's pretty heavy. We need to let the guest OS know it's getting shut down, in a way that it (the guest) will definitely be listening to. The only mechanism that I know of and can do that is ACPI. Would welcome suggestions, though.

I'm unfamiliar with kvm or I'd just submit PR for this - but can SIGTERM-ing the firecracker process leave any state in kvm datastructures or does it all get cleaned up? If that's an expected and safe mechanism for a forced guest-termination let's document it as such. Adding an API to trigger the same SIGTERM handler as well would allow the entire lifecycle to be driven through the http tooling.

How would we return from a a successful SIGTERM? I.e. how could we report the API call status to the caller?

For passing a graceful shutdown signal from the VMM to the guest, it's something a guest would have to opt-in to and have explicit handling to support. Standard ACPI is too much. Random ideas - signal through a serial device or other IO Port? PS/2 keyboard? If not wanting to attach a new device for this, Firecracker could host a server on a link-local address that guests could long-poll for a shutdown message?

By opting-in, are you suggesting having an agent running inside the guest, listening for the shutdown signal? How would that be conceptually different than sending the reboot command via SSH?

estone-aws commented 5 years ago

How would we return from a a successful SIGTERM? I.e. how could we report the API call status to the caller?

The API could queue shutdown of the VMM, which VMM can enact as soon as it writes response to the API out. Could name the API "ScheduleTermination" if that's clearer. The idea is that clients shouldn't have to use multiple tools (curl/http, unix-signals) to transition the VM between states.

estone-aws commented 5 years ago

By opting-in, are you suggesting having an agent running inside the guest, listening for the shutdown signal? How would that be conceptually different than sending the reboot command via SSH?

"Opting-in" referred more to the fact that guests have to actively choose to participate in graceful shutdown, and incorporate supporting code for it. It would entirely be possible to create a guest that could ignore all graceful shutdown mechanisms we could think of.

I wasn't thinking so much 'agent' as 'code'. Firecracker distribution could provide a kmod that reacts to an ultra simple "firecracker-shutdown" device for example (raise an IRQ, read from IO port etc). Guests that want to support graceful shutdown could incorporate that into their system. For guests that don't want to run a network stack, or have open ports, this could be pretty lightweight solution. If inventing a simple device just have to keep in mind supporting arm guests too though.

The other thought, of having guest call out to a link-local server being run by VMM, also doesn't require guests from having to open listern-port.

dhrgit commented 5 years ago

The API could queue shutdown of the VMM, which VMM can enact as soon as it writes response to the API out. Could name the API "ScheduleTermination" if that's clearer. The idea is that clients shouldn't have to use multiple tools (curl/http, unix-signals) to transition the VM between states.

What I meant was that, in order to accurately report the success status of a shutdown call, we'd have to keep the API thread alive, while forcibly terminating the vCPU and VMM threads. So we can't rely on SIGTERM. We'd need a different mechanism. Perhaps this is something we could be looking into.

I wasn't thinking so much 'agent' as 'code'. [...] For guests that don't want to run a network stack, or have open ports, this could be pretty lightweight solution.

I see your point. I'm a bit disinclined to provide non-standard mechanisms for standard actions, but I think it's worth studying some use cases here.

estone-aws commented 5 years ago

disinclined to provide non-standard mechanisms for standard actions

Unfortunately to my knowledge, the only relevant standard here is ACPI which none of us seem to like for microVMs :(

aldem commented 5 years ago

What about using sysrq functionality to trigger shutdown (and probably other functions)? It could be used over serial port too, and if PC keyboard is emulated - directly over keyboard. This is pretty standard way to shutdown a system when there is no ACPI.

As to synchronous operation - shutdown is only a trigger, then it is up to the app that controls to make a decision - if VM is still active after some time, there should be some way to forcibly terminate the VM (discarding all data in transit), since it could happen that system inside is hung and does not respond to anything.

dhrgit commented 5 years ago

What about using sysrq functionality to trigger shutdown (and probably other functions)? It could be used over serial port too, and if PC keyboard is emulated - directly over keyboard. This is pretty standard way to shutdown a system when there is no ACPI.

Thanks! I was just reading about this yesterday, and it looks promising. My understanding so far is that we'd have to implement the shutdown procedure on the VMM side. I.e. trigger SIGTERM, wait, trigger SIGKILL, trigger sync, trigger shutdown. I'll be looking some more into this and, in the meantime, I'd appreciate any contribution, of course.

As to synchronous operation - shutdown is only a trigger, then it is up to the app that controls to make a decision - if VM is still active after some time, there should be some way to forcibly terminate the VM (discarding all data in transit), since it could happen that system inside is hung and does not respond to anything.

That was basically my point: if we couldn't ensure a clean guest shutdown, and the user had to watch the Firecracker PID anyway, I couldn't see the point in adding an extra step to the shutdown procedure. I.e. call InstanceHalt, then wait around for Firecracker to die, when they could just send SIGTERM/SIGKILL and then wait. Of course, that changes if we implement a clean shutdown mechanism, since InstanceHalt could report back on the success of the guest exit.

aldem commented 5 years ago

I have just realized that sysrq alone for shutdown will need few operations, thus making it not really useful for remote control without console access (as you have to kill processes first, then sync & unmount file systems, and finally sending a reboot request - and you have to make sure that all of this steps are completed before rebooting).

But, to request reboot, it is enough to send Ctrl-Alt-Del (which is normally enabled in linux), prerequisite is PC keyboard emulation, of course (though this is much easier and safe comparing to ACPI or any other PM).

I see it in a way that request like InstanceShutdown will pass through reboot request to the guest, giving it time to do whatever is needed, while InstanceHalt (or better InstanceStop, to keep it consistent with InstanceStart) will forcibly terminate VM, send "success" response and exit firecracker process.

Sending a signal to terminate seems not a good option as this (basically) introduces another API, instead of single channel that is used to control everything.

dhrgit commented 5 years ago

Apparently, the trouble with emulating a keyboard is that it increases boot time.

I've put together a PoC that sends ctrl+alt+del to shut down the guest, and it works. However, that needs i8042 and AT keyboard support in the guest kernel, and the i8042 driver spends about 6ms to 8ms (in my tests) probing the 8042 IO ports, at boot time.

So far I see two options to offer a clean shutdown:

aldem commented 5 years ago

May be there are cases when +6-8 ms boot time is critical, but I guess this is nothing comparing to typical container lifetime and init-phase duration (which may take seconds sometimes), thus I vote for AT keyborad option, as this will work almost everywhere.

raduweiss commented 5 years ago

May be there are cases when +6-8 ms boot time is critical

So, our mission is to enable secure multi-tenant container and serverless function workloads. While in both of those categories, start-up time is important, for functions it's super-important. Also there's the slippery slope point of view: there may be many small things that add 10ms of boot time and a little bit of quality of life, and they can add up.

So basically my vote is to not include this in its current form, since it's not a strict requirement to have graceful shutdown.

However, I am curious if that kernel driver can be made to not spend time scanning, and simply initialized in the correct state.

aldem commented 5 years ago

Why not let the user decide? Those who want to squeeze last millisecond from boot time could simple cut off the controller support from the kernel, those who don't could use it. In my use cases, startup time is mostly irrelevant (as long as it does not take more than 30 seconds), while ability to request clean shutdown is important (without giving access to container), and I am quite sure that I am not alone.

Especially when there is PoC, which probably means that 99% of work is done, while kernel driver will certainly complicate things (as it has to be separately compiled, or kernel has to be patched).

raduweiss commented 5 years ago

@aldem if that's our only choice, I'm definitely inclined to have it as an optional feature.

What I was asking is if there's a best case scenario where we get both no performance trade-off, and orderly shutdowns 😄.

mcastelino commented 5 years ago

@raduweiss it would be good for the swagger to indicate that InstanceHalt is no longer supported. Today the error you see is [PUT /actions][400] createSyncActionBadRequest &{FaultMessage:unknown variantInstanceHalt, expectedBlockDeviceRescanorInstanceStartat line 1 column 29}

dhrgit commented 5 years ago

if that's our only choice, I'm definitely inclined to have it as an optional feature.

No need for it to be an optional feature. The boot time is increased by the i8042 driver alone, regardless of our VMM code. The only way to avoid that is to not compile i8042 support into the guest kernel (which, for our tests, we don't).

What I was asking is if there's a best case scenario where we get both no performance trade-off, and orderly shutdowns 😄.

The good news is that we basically have that now. Turns out the i8042 driver takes some parameters (on the kernel command line) that, for our case, can speed up detection considerably. We're currently down to ~1ms introduced boot delay. I'd say we're good.

luxas commented 5 years ago

Hi, thanks for this thread.

I've put together a PoC that sends ctrl+alt+del to shut down the guest, and it works.

Could you create any user docs on how to do this?

dhrgit commented 5 years ago

@luxas #907 is up now, addressing this issue.

andreeaflorescu commented 5 years ago

We now support clean shut down of Firecracker via the action SendCtrAltDel. For reference, check the documentation: https://github.com/firecracker-microvm/firecracker/blob/master/docs/api_requests/actions.md. Closing this issue for now. Feel free to re-open if this is not the expected behaviour.

rchatsiri commented 5 years ago

We now support clean shut down of Firecracker via the action SendCtrAltDel. For reference, check the documentation: https://github.com/firecracker-microvm/firecracker/blob/master/docs/api_requests/actions.md. Closing this issue for now. Feel free to re-open if this is not the expected behaviour.

I'm found an issues after use command call API SendCtrAltDel on server. This is log from console.

$sudo curl --unix-socket /tmp/firecracker.socket -i \
>     -X PUT "http://localhost/actions" \
>     -H  "accept: application/json" \
>     -H  "Content-Type: application/json" \
>     -d "{
>              \"action_type\": \"SendCtrlAltDel\"
>     }"
HTTP/1.1 204 No Content
Date: Tue, 14 May 2019 12:16:55 GMT

Console print result as below.

2019-05-14T05:16:55.790607760 [anonymous-instance:WARN:devices/src/legacy/i8042.rs:146] Failed to trigger i8042 kbd interrupt (disabled by guest OS)

dhrgit commented 5 years ago

@rchatsiri This is a closed feature-request issue. Please open a new issue, describing the problem you are encountering (the comment you posted here will do). Thanks!

radekg commented 3 years ago

Hey @rchatsiri, the clean shutdown does not seem to be doing anything useful on:

# uname -a
Linux r720sas 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

release 0.22.4 using the Alpine 3.8 image used in the quickstart. My vm is stuck in:

...
 * Mounting persistent storage (pstore) filesystem ...
 [ ok ]
Starting default runlevel
[    1.088120] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x240937b9988, max_idle_ns: 440795218083 ns

Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)

localhost login:
Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)

localhost login:
Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)

localhost login:
Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)

localhost login:
Awettt commented 2 years ago

for alpine-minirootfs SendCrtlAltDel works curl --unix-socket ./tmp/firecracker.socket -i \ -X PUT 'http://localhost/actions' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "action_type": "SendCtrlAltDel" }'