breuner / elbencho

A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
GNU General Public License v3.0
162 stars 24 forks source link

csvfile - In progress results over time, similar to FIO? #27

Closed AeonJJohnson closed 2 years ago

AeonJJohnson commented 2 years ago

Greetings.

In testing elbencho and using the csvfile option I get a start and end summary that can be graphed. I am looking for graphing the performance over time, identifying flushes, page cache, storage device issues, etc.

Is there a way to execute elbencho where results are recorded to csvfile at an interval during the benchmark, say every 500ms to 1s?

Thanks

glennklockwood commented 2 years ago

+1 for this feature if it doesn't exist. I've been fudging this by running multiple iterations and seeing how performance varies at a much coarser granularity, but being able to sample instantaneous performance at high frequency (acknowledging that the sampling itself might impact performance) would be really valuable.

breuner commented 2 years ago

Ok, so I know what I'll be doing this weekend :-)

breuner commented 2 years ago

Taking a few days longer than I originally expected, so not done yet, but I'm on it...

breuner commented 2 years ago

@AeonJJohnson , @glennklockwood : Done and uploaded.

Would of course be great if you could try it out to let me know if it fits for the intended purpose.

The corresponding new parameters are

Here's a simple example for creation of a bunch of small files on my little home NAS box:

$ elbencho -d -w -n 10 -N 100 -s 1m -t 16 --livecsv /tmp/live.csv /mnt/smb/smallfiles/ 

The first lines of the /tmp/live.csv:

$ column -x -s, -t -n /tmp/live.csv
Label  Phase  RuntimeMS  Rank   MixType  Done%  DoneBytes   MiB/s  IOPS  Entries  Entries/s  Active  CPU  Service  
       WRITE  2001       Total           11     1998585856  953    953   1890     945        16      47            
       WRITE  4000       Total           12     2163212288  78     78    2048     79         16      6             
       WRITE  6000       Total           13     2325741568  77     77    2202     77         16      9             
       WRITE  8000       Total           14     2474639360  71     71    2344     71         16      5             
[...]

=> So this shows the significant difference between early cached writes and the drop after the first 2 seconds.

glennklockwood commented 2 years ago

Thanks Sven! This looks great for my purposes. Verified by looking at writing out 6 TB.

Unknown

breuner commented 2 years ago

Thanks for checking, Glenn! That's a very interesting graph with the first 22min having quite extreme ups and downs and the remainder appearently not having such high peaks anymore.

@AeonJJohnson: I'm closing this issue under the assumption that the added feature works as intended. But of course please feel free to reopen if something doesn't seem right.