Add benchmarking tools for firecracker.

Anjali05 commented 5 years ago

I have been doing some benchmark for firecraker. Are there any plans of adding some benchmark tool for firecracker or are there any performance test results which I can use as a reference for my study?

iggy commented 5 years ago

There are a number of standard benchmarks in use in computing. They all test different parts of a server/VM or different common scenarios that servers run. I don't think any one benchmark is going to satisfy everyone's curiosity.

There are also some tools that aggregate many benchmarks together. One that I can think of off the top of my head is the Phoronix Test Suite.

alexandruag commented 5 years ago

We're working on adding some benchmarks for block devices using fio, and other devices will follow. These will hopefully provide a basic frame of reference we can use to identify regressions, and get a better overall idea regarding Firecracker IO performance.

Anjali05 commented 5 years ago

How does block device and scheduling(both disk and networking) work in firecracker? Can someone elaborate on this? Is it similar to QEMU?

ustiugov commented 5 years ago

In my understanding (please correct me if I am wrong), virtio-{net, block} frontends are guest-OS based, hence the "VMM-gOS API" must be the same (i.e., ring buffers). For asynchronous IO, Firecracker uses an IO thread (one per VM) that is also similar to the QEMU virtualization case. Regarding the backends implementation, I would be curious to know too. e.g., what about interrupts batching or segmentation offload?

Anjali05 commented 5 years ago

I have been doing some bench-marking for firecracker lately. While running fio, I observed that all the reads/writes are not going to the disk and the bandwidth is too high. I suspect that its reading/writing from the memory. Is there any configuration I need to set to make it write to disk?

gregbdunn commented 5 years ago

You'll want to make sure fio is operating in direct mode if you want to bypass the Linux page and buffer cache.

https://linux.die.net/man/1/fio has details under the 'direct=' section.

Anjali05 commented 5 years ago

@gregbdunn I did use direct mode but the bandwidth is too high (around 4022 MB/sec for write of block size 128k). Also, buffered IO is less than direct IO. These are the command I am using: For direct: fio --name=randwrite --ioengine=libaio --iodepth=16 --rw=randwrite --bs=128k --direct=1 --size=512M --numjobs=2 --runtime=240 --group_reporting

For buffer: fio --name=randwrite --ioengine=libaio --iodepth=16 --rw=randwrite --bs=128k --direct=0 --invalidate=0 --size=512M --numjobs=2 --runtime=240 --group_reporting

Anjali05 commented 5 years ago

I ran dtsat while performing fio, looks like all the writes do not go to the disk immediately. I can see firecracker is still writing even if the fio command finishes execution. I am guessing instead of writing to the disk immediately, it flushes data to some buffer that is flushed later to the disk. Can someone confirm this or correct me if I am wrong?

acatangiu commented 5 years ago

@Anjali05 Data coming from the guest to the emulated block device is directly written (but not flushed) to the underlying backing host file. Firecracker itself does not buffer the data, but the host Linux Page Cache (host file system cache) will buffer the data in memory unless otherwise configured. Firecracker (nor Jailer) do not tweak the host page cache, this is left to the user to configure depending on their use-case (most of the time, the default is best).

Anjali05 commented 5 years ago

@acatangiu Since data is not buffered by firecracker and is written to the backing host file, will there be any difference in direct and buffer IO? I did observe firecracker performing better in direct mode than buffer mode.

alexandruag commented 5 years ago

Hi again! AFAIK, using buffered mode (or rather, not using direct mode) effectively forces an iodepth of 1 regardless of the value you specify. What happens if you retry the comparison with iodepth=1 for both? Also, with direct=1 you do go through the Firecracker device model for every operation, which probably ends up being more painful for smaller block sizes (like 4k).

Anjali05 commented 5 years ago

@alexandruag I did try with iodepth=1, it was not as good as iodepth=16. I do not remember the exact result but it was worse. I did try with different iodepths and for 16 it was performing better so I did the measurements with 16. Do you recommend any iodepth for measurement?

Anjali05 commented 5 years ago

Can someone shed some light on how storage and filesystem works in firecracker? I know the microVMs with a rootfs image but how are things like 'copy on write' work? Is it dependent on the type of filesystem we use as our rootfs image? If we create a file in the rootfs image, is it written anywhere on the host? It would really help if someone could clarify this.

Anjali05 commented 5 years ago

Is there any internal data structure that manages the filesystem image?

dhrgit commented 5 years ago

@Anjali05 Firecracker uses a regular file (from the host file system) as a block device in the guest. For instance, if you have a 10MiB file, on the host (say, at /path/to/rootfs.img), Firecracker can present this file to the guest kernel as the /dev/vda block device, having a capacity of 10MiB. Note that another unit for block device capacity is the sector (traditionally, one sector is 512 bytes) - block devices don't really operate with bytes, but with sectors. In our example, /dev/vda would be 5120 sectors long.

So, a block device (e.g. a physical SSD, or an emulated virtio-blk) is just something that can store raw data, in batches of 512-byte sectors. File names, or file contents, are not concepts that a block device understands - it only knows sectors. The file system is the component that introduces these concepts. It will keep track of where - in the linear space of block device sectors - a file's contents are stored.

To answer your question, when a file is created (by the guest) in its rootfs, this operation will get translated into some write sector commands sent to the /dev/vda, commands which Firecracker will then translate into writes into the /path/to/rootfs.img file.

Regarding copy-on-write, if you're referring to guest file copying, then yes, that's a file-system-dependent feature, unrelated to Firecracker. If your question was more related to QEMU's qcow, though, note that Firecracker doesn't support concurrently attaching the same image file to multiple VMs (unless read-only mode is used withing all VMs).

Anjali05 commented 5 years ago

I am planning to run some profiler like perf for firecracker to get more insights on the performance. Can someone recommend me any other tool if they have used it? Also, while using perf should I run it on the host or inside the microVM? What is the correct way here?

andreeaflorescu commented 5 years ago

@Anjali05 as far as I know we haven't used any tool other than fio (which was already mentioned in a previous comment) for benchmarking the Firecracker block device implementation.

Regarding running perf inside the microVM or on the host, I think it depends on the data that you want to collect and what exactly you want to benchmark. Are you interested only in the performance of the block device?

alindima commented 2 years ago

We currently have the following long-running performance tests (for which we have baselines for m6g.metal and m5d.metal with host kernels 5.10 and 4.14)

block performance (fio): https://github.com/firecracker-microvm/firecracker/blob/main/tests/integration_tests/performance/test_block_performance.py
network latency (ping): https://github.com/firecracker-microvm/firecracker/blob/main/tests/integration_tests/performance/test_network_latency.py
net tcp tput (iperf3): https://github.com/firecracker-microvm/firecracker/blob/main/tests/integration_tests/performance/test_network_tcp_throughput.py
vsock tput (iperf3-vsock): https://github.com/firecracker-microvm/firecracker/blob/main/tests/integration_tests/performance/test_vsock_throughput.py
snap restore performance: https://github.com/firecracker-microvm/firecracker/blob/main/tests/integration_tests/performance/test_snapshot_restore_performance.py

The baselines are formatted as JSON in the configs folder: https://github.com/firecracker-microvm/firecracker/tree/main/tests/integration_tests/performance/configs

firecracker-microvm / firecracker

Add benchmarking tools for firecracker. #1097