TUM-DSE / CVM_eval

Evaluation code for confidential virtual machines (AMD SEV-SNP / Intel TDX)
MIT License
3 stars 0 forks source link

Storage IO Benchmark Variants #9

Open enterJazz opened 1 year ago

enterJazz commented 1 year ago

In this issue, I catalog the different storage-IO fio benchmark variants. Additionally, I list the completion status of the benchmarks in question and where to find them.

Parameters

In this section, we list the different benchmark parameters, such as storage IO software stack (virtio, SPDK...) or guest type (VM, CVM). The Cartesian product of all these parameters produce the total of all benchmarks.

Environment Specific Params

Parameter 1 (P1): Guest Type

SEV-ES required for huge pages - maybe needed for e.g. SPDK

P2: Guest Configuration / Resources

NOTE: should be allocated for same node for NUMA ( ryan only has one node - rose has two nodes ; more to watch out for ) for final version: NUMA (2- node machine)

Non-Env Specific Params

P3: Storage IO Software Stack

spdk, libaio, iouring also possible but probably not necessary

focus: spdk; iouring nice to have

following params only concern VMs:

P4: Encryption

P4.5 Integrity

P5: Storage Level

application specific - fio probably has two modes for this (direct / non-direct)

fio-direct may not work w/ file-level

P6: Measurement Type

Non-Param Config

Further Optimization Areas (non-params)

mmisono commented 1 year ago

For fio, we can try to use the same configuration as Spool (ATC'20)

image

Maybe we also want to measure it with 2MB to see if huge page effect (if any)

Notes from Robert

TODO:

mmisono commented 1 year ago

Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

Notes from Robert

mmisono commented 1 year ago

About dm-crypt: Probably we want to have no_read_workqueue/no_write_workqueue option to avoid asynchronous queueing operation to get better performance (https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-crypt.html)

enterJazz commented 1 year ago

Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

* device mapper (dm-crypt, dm-verity, etc.) works on block layers

* O_DIRECT is a file system option. Usually it means fs do not cache (https://man7.org/linux/man-pages/man2/open.2.html)

  * we should use fio with `--direct` (meaning using O_DIRECT) so that we measure performance of the disk I/O, not memory cache
  * also note that aio requires O_DIRECT as otherwise it is likely to become blocking io (cf. https://lse.sourceforge.net/io/aio.html)

this is directly related to the comments in P5, right?

mmisono commented 1 year ago

Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

* device mapper (dm-crypt, dm-verity, etc.) works on block layers

* O_DIRECT is a file system option. Usually it means fs do not cache (https://man7.org/linux/man-pages/man2/open.2.html)

  * we should use fio with `--direct` (meaning using O_DIRECT) so that we measure performance of the disk I/O, not memory cache
  * also note that aio requires O_DIRECT as otherwise it is likely to become blocking io (cf. https://lse.sourceforge.net/io/aio.html)

this is directly related to the comments in P5, right?

Yes it is; I also believe other research papers (e.g., Spool) use direct I/O for measurements.

mmisono commented 1 year ago

For Integrity:

TODO: investigate what is suitable for our case

Reference

dm-integrity provides a lighter weight read-write block level integrity protection for file systems not requiring full disk encryption, but which do require writability.


- [Data integrity protection with cryptsetup tools](https://archive.fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf)
- [use dm-crypt + dm-integrity + dm-raid](https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb)
- [AN INTRODUCTION TO DM-VERITY IN EMBEDDED DEVICE SECURITY](https://www.starlab.io/blog/dm-verity-in-embedded-device-security)
- https://ieeexplore.ieee.org/abstract/document/10070924/

dm-crypt also offers integrity checking of read-only filesystems where the entire block device is verified at once. This approach is particularly time-consuming and thus is typically used only during device startup [6], [44]. dm-verify [6] uses a software maintained Merkle tree structure to compute and validate hashes of read-only data blocks against pre-computed hashes. In contrast, dm-integrity keeps individual hashes for each data block during runtime, which allows verification for read/write system. However, it cannot detect physical attacks such as reordering the blocks within the same device due to the lack of a secure root of trust in the system.

mmisono commented 1 year ago

spdk:

Reference

enterJazz commented 1 year ago

I just finished creating the base benchmark runner - see #10

enterJazz commented 1 year ago

To execute it.:

cd ./tools/storage-io-bm-runner
nix-shell
source .venv/bin/activate
cd ./bm-runner-tool
mkdir -p resources
python3 main.py --name my-bm --stack=native-io --storage-level=file-level --measurement-type=io-average-latency --resource-dir=./resources

more options can be viewed using

python3 main.py --help

all parameters of P6 are currently implemented - others are lacking

mmisono commented 1 year ago

From TDX Linux Guest Kernel Security Specification

The virtIO subsystem is also highly configurable with different options possible for the virtual queue’s types, transportation, etc. For the virtual queues, currently the only mode that was hardened (by performing code audit and fuzzing activities outlined in Intel® Trust Domain Extension Guest Linux Kernel Hardening Strategy) is a split virtqueue without indirect descriptor support, so this mode is the only one recommended for the secure virtio communication.

TODO:

mmisono commented 1 year ago

We also want to have CPU time breakdown figures like bifrost

image

Breakdown measurement

Here is a script to measure breakdown (for SEV)

swiotlb options

We can control swiotlb via kernel command parameters

        swiotlb=        [ARM,IA-64,PPC,MIPS,X86]
                        Format: { <int> [,<int>] | force | noforce }
                        <int> -- Number of I/O TLB slabs
                        <int> -- Second integer after comma. Number of swiotlb
                                 areas with their own lock. Will be rounded up
                                 to a power of 2.
                        force -- force using of bounce buffers even if they
                                 wouldn't be automatically used by the kerne
mmisono commented 1 year ago

Other than block-level encryption, there are also fs-level encryption method

This shows some benchmarking results among dm-crypt/encryptfs/fscrypt:

TODO

NOTE

Notes from Robert

Pros and Cons of FS-level encryption

Pros

Cons

Resumé:

Scenarios where FS-Level may work better in CVMs

NOTE: splitting data from file metadata may be key to optimization - so fs-level is not out of the race

enterJazz commented 1 year ago

Integrity is now implemented as well :checkered_flag:

enterJazz commented 1 year ago

Adding VM execution support next; as the VM setup is pretty complex and requires changing things in the BIOS / management interface etc, I will NOT include the VM setup in the tool.

Instead, to get VM results, one will execute the tool inside of the VM. If this approach is not enough to ensure reproducibility, we can maybe come up w/ a hybrid solution- e.g. one passes the tool parameters to ssh inside the VM, wherein it itself executes the BMs

mmisono commented 1 year ago

Instead, to get VM results, one will execute the tool inside of the VM. If this approach is not enough to ensure reproducibility, we can maybe come up w/ a hybrid solution- e.g. one passes the tool parameters to ssh inside the VM, wherein it itself executes the BMs

vmsh and ushell did a similar thing for the automated test (#3) https://github.com/TUM-DSE/ushell/blob/main/misc/tests/qemu.py

for now some manual work is fine but we want to have these things in the future..

Notes from Robert

enterJazz commented 1 year ago

New TODO:

enterJazz commented 1 year ago

Benchmark Issues

w/ the following config, we receive unexpected results (investigated closer on bw tests, however also present on iops, alat:

TODO:

mmisono commented 1 year ago

vhost configuration

mmisono commented 1 year ago

(when using new disk), Before running any benchmarks, we should fill random data to the NVMe disk.

Reference

mmisono commented 1 year ago

“purge” ssd before precondition is desirable image https://www.micron.com/-/media/client/global/documents/products/technical-marketing-brief/brief_ssd_performance_measure.pdf

TODO

enterJazz commented 1 year ago

For real world benchmarks we can use: