breuner / elbencho

A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
GNU General Public License v3.0
171 stars 25 forks source link

Report metadata operation time? #43

Open glennklockwood opened 1 year ago

glennklockwood commented 1 year ago

Is there a way to have elbencho report the time it takes for all processes to open and/or close all files? IOR does this when there's a barrier between open/write/read/close phases of a test, but I couldn't find anything in the elbencho code that suggests these timings can be reported.

There two reasons this might be useful:

  1. Some cloud caching file systems can read the entire file contents and cache it on a local disk as a part of open(2). So all the I/O happens during the open call, and the read performance reported by elbencho winds up being the cache performance, not the end-to-end performance.
  2. When doing a ton of buffered writes, a lot of write I/O winds up happening in the close(2) call, so splitting out this time can give an indication of writeback performance.
breuner commented 1 year ago

Hi @glennklockwood , currently there is indeed no separate measurement for open and close latency. You have this indirectly when you use the -n/-N options (for number of files per thread) or the --treefile option together with --lat: In these cases, elbencho will print lines for min/avg/max "IO latency" (referring to the latency of individual block-sized read and write ops) plus another line for "File latency" (called "Entry lat" in the csv file, so that the same column title also works for directory creation results). File latency refers to the time to create/write or read an entire file, including the time for open() and close() of the file. Thus, the latency of open+close could be calculated as "average file latency minus average latency per IO times number of IOs per file". As an example: elbencho -d -w -t 1 -n 1 -N 2 -b 1m -s 10m /mnt/mystorage/mydir ...would create two 10MiB-sized files in 1MiB blocks, so 10 IOs per file. If avg IO latency is reported as 10ms and the avg file latency is reported as 200ms, then the avg latency of open+close is: 200ms - 10x10ms = 100ms

However, if this is not exact enough (e.g. because you can't see the latency of open independent of the latency of close) or if you think this is just not convenient enough, then it will of course not be a problem to add separate latency measurements for open and close. Just let me know.