benchmark time period - Githubissues

Tarsnap / kivaloo

Kivaloo is a collection of utilities which together form a data store associating keys of up to 255 bytes with values of up to 255 bytes.

http://www.tarsnap.com/kivaloo.html

Other

201 stars 17 forks source link

benchmark time period #181

Open gperciva opened 4 years ago

gperciva commented 4 years ago

Do you have a particular intuition behind taking a particular time range (such as 50 to 60 seconds for bulk_update)?

In the attached pngs, it looks like the number of operations per second in bulk_update are randomly distributed. Here's 3 tests (I cancelled the last one a little bit early).

I'd be tempted to use the median, or the 25% & 75% quadrants, rather than the mean of a specific time range. (As it happens, I spent the past 2 days working on perftests for spiped, so I have this in my mind.)

bulk1 bulk2 bulk3

gperciva commented 4 years ago

BTW, those benchmarks were done on my freebsd desktop. Would it be helpful if I used some standard EC2 hardware, like C6g.medium or c5.large?

cperciva commented 4 years ago

My concern with benchmarks is "warming up" -- you can see in those graphs that the performance in the first second is higher than later, presumably because data structures are clean and kvlds isn't being slowed down by needing to evict pages from memory. How long this warmup period takes will depend on the benchmark, so I went with a conservative value.

I'm not expecting to run these benchmarks very often -- they exist mainly for comparing between versions -- so I'm not too concerned about them taking a while to run.

cperciva commented 4 years ago

And yes, for comparison purposes these need to run on standard hardware. But no rush right now.

gperciva commented 4 years ago

Sure, but the means of 50 to 60 are quite far in the first two examples:

$ awk '{if ((NR >= 50) && (NR < 60)) sum+=$0} END {print sum/10}' foo.txt 
266913
$ awk '{if ((NR >= 50) && (NR < 60)) sum+=$0} END {print sum/10}' bar.txt 
322541

whereas the medians of 10 to 60 are closer (although admittedly not as close as I was expecting).

$ awk '{if ((NR >= 10) && (NR < 60)) a[i++] = $1} END {print a[int(i/2)]}' foo.txt 
294197
$ awk '{if ((NR >= 10) && (NR < 60)) a[i++] = $1} END {print a[int(i/2)]}' bar.txt 
335462

cperciva commented 4 years ago

Did you forget a sort when calculating the medians?

gperciva commented 4 years ago

Oops. Yeah, that gives much more similar values. Invoking a useless cat since it adds clarity:

$ cat foo.txt | awk '{if ((NR >= 10) && (NR < 60)) print $1}' | sort | awk '{a[i++]=$1} END {print a[int(i/2)]}'
324656
$ cat bar.txt | awk '{if ((NR >= 10) && (NR < 60)) print $1}' | sort | awk '{a[i++]=$1} END {print a[int(i/2)]}'
329101

gperciva commented 4 years ago

BTW, c5.large produces the same type of timing data:

cperciva commented 4 years ago

Sounds good to me.

BTW the low performance on c5.large is because reading the clock is ridiculously slow. Or rather, it is with the default settings -- adjusting the timecounter used on FreeBSD speeds things up dramatically. I need to dig into that at some point.

gperciva commented 4 years ago

Do you mean "the default clock method used by monoclock.c is slow on c5.large", or do you mean "there's something in the kernel that's sub-optimal"?

FWIW, c6g.large has three times the operations with bulk_update.

c6g

cperciva commented 4 years ago

The FreeBSD kernel has a setting which tells is where to get the time from, and the default is suboptimal, at least for x86.