FAQ: Clarify what workloads work best with Firecracker

AnandJayaraman commented 5 years ago

Can we describe what workloads are ideal for Firecracker? Do we have any performance statistics comparing a micro service running in Docker vs Firecracker?

raduweiss commented 5 years ago

@AnandJayaraman, at a high level this is covered in our charter and the current speficiations.

In essence, you want to use Firecracker if you want to properly isolate container / serverless function workloads running on the same host from each-other. This makes a lot of economical sense since it lets you have very beefy hosts, and then run lots of workloads on them - depending on your requirements Firecracker will let you oversubscribe both CPU and memory.

There's no direct Docker vs Firecracker comparison to be made, since it will depend on your architecture. For example, there's a valid micro-service architecture where you run a few docker Docker containers inside of a Firecracker microVM, and then run many microVMs on your host. In that case you would compare Firecracker + Docker vs just Docker. What workload do you have in mind?

AnandJayaraman commented 5 years ago

The workload I have in mind is just a typical microservice. I wanted to experiment running it in firecracker instead of a docker container. Essentially see how quickly it boots up etc. in a micro VM over a container. If there are any performance metrics on the same it would greatly help. From an external perspective I also wanted to compare a Micro VM with a container. Given the lightweight nature of both these technologies the lines get blurred a bit. I wanted to understand how they compare and contrast. Are there orchestration technologies for a Micro VM just as kubernetes or similar for containers?

AnandJayaraman commented 5 years ago

Just to qualify the microservice I have in mind it is a typical rest service which gets invoked by different clients (webapps etc). These are spring boot apps which always run. Is Firecracker not suited for such apps?

raduweiss commented 5 years ago

Well, the microVM start-up time will have three components adding up:

the microVM + guest Kernel start-up time, with current KPIs written down in the specifications - that's currently 125 ms now for the Linux kernel configured for Firecracker, on an i3.metal (the current reference CPU);
the guest user-space start-up, which is entirely up to what user-space & services your operating system will run. It could be a few hundred milliseconds, or many seconds;
the setup that your application (in your example, the JVM & spring, any keys that need to be transferred, etc.) does before it's ready to run.

Of these, the latter component also happens when you run in a container, while the first two components, don't happen in a container (since you're container can share the Linux kernel some OS services with the host). So, your raw overhead vs just using containers would be a sum of the first two components, probably between 1 and 10 seconds for a use case like yours - this is the trade-off made for better isolation.

For some use cases, keeping microVM warm pools around to mitigate most of the first two components can also make sense, but in this case you'll trade off for host-side software complexity.

This is definitely one area that we're interested to improve in 2019.

wkozaczuk commented 5 years ago

If I may add to this thread:

Regarding the step 1, does 125 ms account for the time it takes to start a firecracker process and issue REST calls to create boot source and setup block device? Or is it from the point firecracker receives start instance REST call? I know that setup block device and create instance calls are pretty fast (< 1ms), but if you use curl to issue each call there is some overhead just from the fact that new curl process has to be spawned for each call and new UNIX domain connection created each time. Obviously one can imagine writing simple controller for example as a script in Python that would chain all that in one shot. Or maybe firecracker could have some sort of API template mechanism where you can issue list of commands or be able to start firecracker that would create instance and all devices given some input configuration file.

Also I would imagine that in most container deployments, containers do not run directly on bare-metal hosts but rather on Linux VMs be it traditional (non-i3.metal) EC2 instances or firecracker micro VMs. In case of Fargate as I understand containers run inside or firecracker VMs so in my opinion it is comparing apples and oranges.

However given how light and fast firecracker VMM is to boot and start a guest one can imagine running individual workloads like microservices in firecracker VMs. But instead of using Linux as a guest I would advocate using unikernels like OSv for that purpose which can boot 10-15 times faster. Please see #857 for some details.

I do not have access to i3.metal instance nor willing to pay 5$ per hour to gain access to one so I do not know how fast OSv would boot on firecracker on i3.metal. But I can tell you some timings from the experiments I conducted on my 5-years old i7 MacBook Pro with 4 cores running Ubuntu 18.10. Just to have an idea how slow my laptop is, when I run Linux (same reference image that firecracker integration tests use), firecracker reports on average slightly under 200ms per "Guest-boot-time" line which is 60% slower than 125ms when running on i3.metal (from firecracker page). For comparison on the same laptop OSv unikernel with simple hello world C app in it can boot on average in 11-15 ms per same "Guest-boot-time" line. It would take another 3 ms to terminate instance. So say on average it takes under 20ms to execute instance from start to shutdown.

I have also run simple hello world java example on OSv and corresponding timing is on average 160 ms. For comparison when I run same Java app directly on same Linux bare metal host I get on average 80 ms (per time utility). I would imagine running same app in docker on same host might be slightly slower than 80 ms but I am sure it would still be faster than 160ms.

raduweiss commented 5 years ago

If I may add to this thread:

Regarding the step 1, does 125 ms account for the time it takes to start a firecracker process and issue REST calls to create boot source and setup block device? Or is it from the point firecracker receives start instance REST call?

It's the latter.

I know that setup block device and create instance calls are pretty fast (< 1ms), but if you use curl to issue each call there is some overhead just from the fact that new curl process has to be spawned for each call and new UNIX domain connection created each time. Obviously one can imagine writing simple controller for example as a script in Python that would chain all that in one shot.

You're absolutely right. There is a controller.

Or maybe firecracker could have some sort of API template mechanism where you can issue list of commands or be able to start firecracker that would create instance and all devices given some input configuration file.

You are again absolutely right. It looks like lots of people think this is a good idea, and we're converging on adding a "single API call" configuration feature for Firecracker this year. #923 is a good place to talk about this.

Also I would imagine that in most container deployments, containers do not run directly on bare-metal hosts but rather on Linux VMs be it traditional (non-i3.metal) EC2 instances or firecracker micro VMs. In case of Fargate as I understand containers run inside or firecracker VMs so in my opinion it is comparing apples and oranges.

However given how light and fast firecracker VMM is to boot and start a guest one can imagine running individual workloads like microservices in firecracker VMs. But instead of using Linux as a guest I would advocate using unikernels like OSv for that purpose which can boot 10-15 times faster. Please see #857 for some details.

I do not have access to i3.metal instance nor willing to pay 5$ per hour to gain access to one so I do not know how fast OSv would boot on firecracker on i3.metal.

Indeed, that's not really a development machine 😄. But there's nothing magical about the CPU there; we've observed that the InstanceStart boot time (to guest user space) scales mostly with the CPU frequency (as long as there are at least 2 vCPUs).

But I can tell you some timings from the experiments I conducted on my 5-years old i7 MacBook Pro with 4 cores running Ubuntu 18.10. Just to have an idea how slow my laptop is, when I run Linux (same reference image that firecracker integration tests use), firecracker reports on average slightly under 200ms per "Guest-boot-time" line which is 60% slower than 125ms when running on i3.metal (from firecracker page). For comparison on the same laptop OSv unikernel with simple hello world C app in it can boot on average in 11-15 ms per same "Guest-boot-time" line. It would take another 3 ms to terminate instance. So say on average it takes under 20ms to execute instance from start to shutdown I have also run simple hello world java example on OSv and corresponding timing is on average 160 ms. For comparison when I run same Java app directly on same Linux bare metal host I get on average 80 ms (per time utility). I would imagine running same app in docker on same host might be slightly slower than 80 ms but I am sure it would still be faster than 160ms.

alxiord commented 4 years ago

There appear to be no open items on this issue; please reopen if we missed anything.

firecracker-microvm / firecracker

FAQ: Clarify what workloads work best with Firecracker #908