firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
26.21k stars 1.82k forks source link

Slower networking of OSv on firecracker vs QEMU/KVM #1034

Closed wkozaczuk closed 2 years ago

wkozaczuk commented 5 years ago

Before I dive into details of my performance test results I would to take this occasion to announce on this forum that firecracker is officially and fully supported by OSv unikernel as of latest 0.53.0 release (nickname "Firebird", for details please read here). It can boot in as low as 5ms per bootchart and 10ms per firecracker guest boot time measurement. Maybe it is worth mentioning on https://firecracker-microvm.github.io/ (section "What operating systems are supported by Firecracker?") that besides Linux, OSv can boot on firecracker as well ;-) ? From what I am aware of, OSv is the only unikernel and possibly the only other OS besides Linux that can claim this as of this point in time.

As far as performance comparison between OSv running on firecracker vs QEMU/KVM goes, first I must say that at least in one aspect firecracker beats QEMU - file I/O. I have not done any other elaborate file I/IO tests but for example mounting ZFS filesystem is at least 5 times faster on firecracker - on average 60ms on firecracker vs 260ms on QEMU. Now as far networking goes, OSv performs a little worse on firecracker vs QEMU and it varies between 50-90% of the performance on QEMU in terms of requests per second depending mostly on number of vCPUs and type of the application I used to test.

My tests were focused of number of REST API requests handled per seconds by a typical microservice app implemented in Rust, using hyper, Golang and Java using vertx.io. Each app in essence implements simple todo REST api returning a json payload of 100-200 characters long.

The test setup looked like this:

Host:

Client machine:

The host and client machine were connected directly to 1 GBit ethernet switch and host exposed guest IP using a bridged TAP nic.

Here is a list of pure req/sec results:

Go 1 CPU - FC & QEMU
-------------------
Requests/sec:  16422.33
Requests/sec:  16540.24
Requests/sec:  16721.56
-------------------
Requests/sec:  23300.26
Requests/sec:  23874.74
Requests/sec:  24313.06

Go 2 CPU - FC & QEMU
-------------------
Requests/sec:  26676.68
Requests/sec:  28100.00
Requests/sec:  28538.35
-------------------
Requests/sec:  33581.87
Requests/sec:  35475.22
Requests/sec:  37089.26

Rust 1 CPU - FC & QEMU
-------------------
Requests/sec:  23379.86
Requests/sec:  23477.19
Requests/sec:  23604.27
-------------------
Requests/sec:  41100.07
Requests/sec:  43455.34
Requests/sec:  43927.73

Rust 2 CPU - FC & QEMU
-------------------
Requests/sec:  46128.15
Requests/sec:  46590.41
Requests/sec:  46973.84
-------------------
Requests/sec:  48076.98
Requests/sec:  49120.31
Requests/sec:  49298.28

Java 1 CPU - FC & QEMU
-------------------
Requests/sec:  20191.95
Requests/sec:  21384.60
Requests/sec:  21705.82
-------------------
Requests/sec:  41049.41
Requests/sec:  43622.81
Requests/sec:  44777.60

Java 2 CPU - FC & QEMU
-------------------
Requests/sec:  40625.69
Requests/sec:  40876.17
Requests/sec:  43766.45
-------------------
Requests/sec:  45746.48
Requests/sec:  46224.42
Requests/sec:  46245.95

For more detailed results please see the files where I captured full output from wrk - https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote/OSv_firecracker and https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote/OSv_qemu.

Would you have any insight of what might be the reason of relatively slower performance of firecracker? I think I have disabled the rate limiting which is what this script does - https://github.com/cloudius-systems/osv/blob/master/scripts/firecracker.py#L23-L97. It could be also that virtio-mmio implementation on OSv side is not very well optimized - with QEMU OSv uses virtio-pci.

Any help will be greatly appreciated.

andreeaflorescu commented 5 years ago

If you do not want to have a rate limiter, you can just write:

 def add_network_interface(self, interface_name, host_interface_name, ):
        self.make_put_call('/network-interfaces/%s' % interface_name, {
            'iface_id': interface_name,
            'host_dev_name': host_interface_name,
            'guest_mac': "52:54:00:12:34:56"
})

because the rate_limiter is an optional field. I am not sure what are the effects of setting every field of the rate_limiter to 0.

We will get back to you after we get a chance to investigate this.

raduweiss commented 5 years ago

@wkozaczuk , first of all I'll say that the folks in the Firecracker maintainer team have seen (and several of us are really excited by) OSv running with Firecracker. Your start-up times are awesome! Frankly, I think we're a bit behind on recognizing Firecracker integrations with other projects, and we will be working on making this better.

We also appreciate the in-depth issue descriptions that have helped us make Firecracker better.

The website (and maybe a docs page in the repo) is one place to showcase this, but I'd also like to write about out our current integrations in something like a blog post. If that's all right with you, we'll get in touch once we have a more clear idea, probably in a couple of weeks.

Regarding IO, I'm not surprised by the results. Rate limiting aside, we simply didn't spend much time on IO optimization (especially disk), since it wasn't a priority for our current users/customers. While IO is definitely something we want to improve on, prioritizing it will depend on user/customer demand (unless someone contributes it 🙂). Here I mean users customers in a sense that includes everyone using Firecracker. So if your group has a specific use case where you're IO-bottlenecked, let us know.

wkozaczuk commented 5 years ago

@raduweiss I am very much open in collaborating on blog post.

I do not think that we have any specific use case in mind. I was myself curious to compare how OSv fares on firecracker vs QEMU. I wonder if slowness on networking side is cause by more frequent exists to the host comparing with QEMU. I would be nice to do similar comparison with Linux.

wkozaczuk commented 5 years ago

@raduweiss I have just recently published an article on OSv blog about what it took to enhance OSv to make it boot on firecracker.

andreeaflorescu commented 5 years ago

@wkozaczuk Nice article! Congrats!

Pusnow commented 4 years ago

Is there any progress or finding on this?

I'm testing on memcached with OSv/firecracker. It seems firecracker one is 33% slower than QEMU one.

andreeaflorescu commented 4 years ago

@Pusnow we are going to look at this as part of some other improvements we want in our Virtio implementation. Progress is expected in the next few weeks. We'll update this issue with our findings.

serban300 commented 2 years ago

We are tracking a larger network performance theme. No work is planned in the short term. If anyone has specific questions or comments regarding this issue, please reopen !