Open enterJazz opened 1 year ago
For fio, we can try to use the same configuration as Spool (ATC'20)
Maybe we also want to measure it with 2MB to see if huge page effect (if any)
TODO:
Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram
--direct
(meaning using O_DIRECT) so that we measure performance of the disk I/O, not memory cacheO_SYNC
as well to ensure synchronicity to avoid tail latency ( as in https://blog.cloudflare.com/speeding-up-linux-disk-encryption/ ):
O_DIRECT (since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.
About dm-crypt: Probably we want to have no_read_workqueue/no_write_workqueue option to avoid asynchronous queueing operation to get better performance (https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-crypt.html)
Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram
* device mapper (dm-crypt, dm-verity, etc.) works on block layers * O_DIRECT is a file system option. Usually it means fs do not cache (https://man7.org/linux/man-pages/man2/open.2.html) * we should use fio with `--direct` (meaning using O_DIRECT) so that we measure performance of the disk I/O, not memory cache * also note that aio requires O_DIRECT as otherwise it is likely to become blocking io (cf. https://lse.sourceforge.net/io/aio.html)
this is directly related to the comments in P5, right?
Here is a diagram of the Linux storage stack: https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram
* device mapper (dm-crypt, dm-verity, etc.) works on block layers * O_DIRECT is a file system option. Usually it means fs do not cache (https://man7.org/linux/man-pages/man2/open.2.html) * we should use fio with `--direct` (meaning using O_DIRECT) so that we measure performance of the disk I/O, not memory cache * also note that aio requires O_DIRECT as otherwise it is likely to become blocking io (cf. https://lse.sourceforge.net/io/aio.html)
this is directly related to the comments in P5, right?
Yes it is; I also believe other research papers (e.g., Spool) use direct I/O for measurements.
For Integrity:
TODO: investigate what is suitable for our case
Both dm-verity and dm-crypt provide block level integrity protection.
dm-verity provides block level integrity protection for read-only file
systems, while dm-crypt provides block level integrity protection, with
minimum penalty, for filesystems requiring full disk encryption.
dm-integrity provides a lighter weight read-write block level integrity protection for file systems not requiring full disk encryption, but which do require writability.
- [Data integrity protection with cryptsetup tools](https://archive.fosdem.org/2018/schedule/event/cryptsetup/attachments/slides/2506/export/events/attachments/cryptsetup/slides/2506/fosdem18_cryptsetup_aead.pdf)
- [use dm-crypt + dm-integrity + dm-raid](https://gist.github.com/MawKKe/caa2bbf7edcc072129d73b61ae7815fb)
- [AN INTRODUCTION TO DM-VERITY IN EMBEDDED DEVICE SECURITY](https://www.starlab.io/blog/dm-verity-in-embedded-device-security)
- https://ieeexplore.ieee.org/abstract/document/10070924/
dm-crypt also offers integrity checking of read-only filesystems where the entire block device is verified at once. This approach is particularly time-consuming and thus is typically used only during device startup [6], [44]. dm-verify [6] uses a software maintained Merkle tree structure to compute and validate hashes of read-only data blocks against pre-computed hashes. In contrast, dm-integrity keeps individual hashes for each data block during runtime, which allows verification for read/write system. However, it cannot detect physical attacks such as reordering the blocks within the same device due to the lack of a secure root of trust in the system.
spdk:
I just finished creating the base benchmark runner - see #10
To execute it.:
cd ./tools/storage-io-bm-runner
nix-shell
source .venv/bin/activate
cd ./bm-runner-tool
mkdir -p resources
python3 main.py --name my-bm --stack=native-io --storage-level=file-level --measurement-type=io-average-latency --resource-dir=./resources
more options can be viewed using
python3 main.py --help
all parameters of P6 are currently implemented - others are lacking
From TDX Linux Guest Kernel Security Specification
The virtIO subsystem is also highly configurable with different options possible for the virtual queue’s types, transportation, etc. For the virtual queues, currently the only mode that was hardened (by performing code audit and fuzzing activities outlined in Intel® Trust Domain Extension Guest Linux Kernel Hardening Strategy) is a split virtqueue without indirect descriptor support, so this mode is the only one recommended for the secure virtio communication.
TODO:
We also want to have CPU time breakdown figures like bifrost
Here is a script to measure breakdown (for SEV)
We can control swiotlb via kernel command parameters
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
<int> -- Second integer after comma. Number of swiotlb
areas with their own lock. Will be rounded up
to a power of 2.
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kerne
Other than block-level encryption, there are also fs-level encryption method
This shows some benchmarking results among dm-crypt/encryptfs/fscrypt:
TODO
NOTE
Resumé:
NOTE: splitting data from file metadata may be key to optimization - so fs-level is not out of the race
Integrity is now implemented as well :checkered_flag:
Adding VM execution support next; as the VM setup is pretty complex and requires changing things in the BIOS / management interface etc, I will NOT include the VM setup in the tool.
Instead, to get VM results, one will execute the tool inside of the VM. If this approach is not enough to ensure reproducibility, we can maybe come up w/ a hybrid solution- e.g. one passes the tool parameters to ssh inside the VM, wherein it itself executes the BMs
Instead, to get VM results, one will execute the tool inside of the VM. If this approach is not enough to ensure reproducibility, we can maybe come up w/ a hybrid solution- e.g. one passes the tool parameters to ssh inside the VM, wherein it itself executes the BMs
vmsh and ushell did a similar thing for the automated test (#3) https://github.com/TUM-DSE/ushell/blob/main/misc/tests/qemu.py
for now some manual work is fine but we want to have these things in the future..
fio
write to device directly instead of file (
related to P5 )w/ the following config, we receive unexpected results (investigated closer on bw
tests, however also present on iops
, alat
:
loop=5
, size=4G
:
loop=1
,size=1G
:
TODO:
vhost configuration
(when using new disk), Before running any benchmarks, we should fill random data to the NVMe disk.
Intel
The usual practice is to keep writing to the disk after formatting, filling it up and making it stable. Take the Intel SSD DC P3700 800GB as an example. Usually, it is sequentially written with 4KB block size for two hours, and then randomly written for one hour. In addition, during the test, the ramp_time in the fio parameter can be set larger to avoid an initial unreasonably high value being calculated in the final result. https://www.intel.com/content/www/us/en/developer/articles/technical/evaluate-performance-for-storage-performance-development-kit-spdk-based-nvme-ssd.html
AMD
It is recommended to run the following workloads with twice the advertised capacity of the SSD to guarantee that all available memory is filled with data including the factory provisioned area. • Secure erase the SSD • Fill SSD with 128k sequential data twice • Fill the drive with 4k random data https://www.amd.com/en/server-docs/nvme-ssd-performance-evaluation-guide-for-windows-server-2016-and-red-hat-enterprise
Micron
Precondition: Following SNIA’s Performance Test Specification for workload-independent precondition, we write the drive with 128KB sequential transfers aligned to 4K boundaries over 2X the drive’s advertised capacity https://www.micron.com/-/media/client/global/documents/products/technical-marketing-brief/brief_ssd_performance_measure.pdf
Reference
“purge” ssd before precondition is desirable https://www.micron.com/-/media/client/global/documents/products/technical-marketing-brief/brief_ssd_performance_measure.pdf
TODO
For real world benchmarks we can use:
In this issue, I catalog the different storage-IO
fio
benchmark variants. Additionally, I list the completion status of the benchmarks in question and where to find them.Parameters
In this section, we list the different benchmark parameters, such as storage IO software stack (virtio, SPDK...) or guest type (VM, CVM). The Cartesian product of all these parameters produce the total of all benchmarks.
Environment Specific Params
Parameter 1 (P1): Guest Type
P2: Guest Configuration / Resources
Non-Env Specific Params
P3: Storage IO Software Stack
following params only concern VMs:
virtio-(blk|nvme|scsi)
vhost-spdk
w/ pollingP4: Encryption
dm-encrypt
(on / off) (R: Kernel IO)dm-verity
in combinationspdk
encryption (if available: otherwise, we need to write own encryption)P4.5 Integrity
dm-verity
(on / off) - also block level (inside kernel) )spdk
- own integrity ( maybe we also need to implement this )P5: Storage Level
P6: Measurement Type
Non-Param Config
Further Optimization Areas (non-params)