Level / bench

Benchmark abstract-level databases.
MIT License
8 stars 2 forks source link

Merge write, -sorted and -random benchmarks into one #6

Closed vweevers closed 5 years ago

vweevers commented 5 years ago

And add options for the keys and values, e.g. --keys/--values <type> where type is one of random, seq, seqReverse. Optionally combined with --keySize/--valueSize.

And rename the merged benchmark to put.

The next steps might be to:

  1. Compare and evaluate these options, possibly remove some
  2. Replace and/or enhance the options with "property-based" options (e.g. where you could specify --length-distribution zipfian) (https://github.com/Level/levelup/issues/227)
  3. Create "workloads" with mixed operations (along the lines of level-bench run workload --read-proportion 0.4 --update-proportion ..).
vweevers commented 5 years ago

Interesting bit from the YCSB paper:

The workload client must make many random choices when generating load: which operation to perform (Insert, Update, Read or Scan), which record to read or write, how many records to scan, and so on. These decisions are governed by random distributions. YCSB has several built-in distributions:

  • Uniform: Choose an item uniformly at random. For example, when choosing a record, all records in the database are equally likely to be chosen.
  • Zipfian: Choose an item according to the Zipfian distribution. For example, when choosing a record, some records will be extremely popular (the head of the distribution) while most records will be unpopular (the tail).
  • Latest: Like the Zipfian distribution, except that the most recently inserted records are in the head of the distribution.
  • Multinomial: Probabilities for each item can be specified. For example, we might assign a probability of 0.95 to the Read operation, a probability of 0.05 to the Update operation, and a probability of 0 to Scan and Insert. The result would be a read-heavy workload.

Also see the HyperDex benchmarks (which used YCSB)

vweevers commented 5 years ago

Progress: wrote a pseudo-random key generator, based on a seed (meaning each benchmark can use the same set of keys) and a distribution (uniform or zipfian). The y-axis here shows the generated keys as integers for visual purposes; they are actually encoded with lexicographic-integer (hex or binary).

self-distribution 1559083519532

(the keys in the zipfian distribution lean towards 0, maybe i'm misusing it, if not, zero-padding or offsetting is gonna be important so that the most frequently used keys don't also have the least bytes)

vweevers commented 5 years ago

the keys in the zipfian distribution lean towards 0

Happened because I used a skew parameter (explained here) of 1. When skew is 0 it correctly results in a uniform distribution. I compared the results to a simpler uniform algorithm (random() * N) (both using the same seeded PRNG) and the results are very close:

Edit: I went with a different algorithm (that is uniform but not random() * N) in the end, disregard this.

self-distribution 1559126821743

vweevers commented 5 years ago

There are differences between write-random/-sorted and write that we need to consolidate:

  1. write-random/-sorted collects duration per op
  2. write collects average duration per 1000 ops, and throughput in MB/s, averaged over time

While (1) is exact and paints a more complete picture, it's also more data to plot - which gnuplot can barely handle. I definitely want to keep the throughput metric ~but I prefer to collect the average throughput of the last 1000 ops. And add a command-line argument to control the size of that window (default is 1000, if 1, aggregation is effectively disabled, if N, you get a cumulative average).~1

We could record values into native-hdr-histogram or student-histogram and then also display statistics like mean, stddev and relative margin of error as text in the plot.

We could also make that "window" dynamic, à la benchmark.js (JSPerf) which has a warmup step to find the number of ops needed per observation to make timings statistically significant.2


1 Down the line we should write raw, non-aggregated values to the CSV file and derive new data from that when needed, but atm that would complicate the integration with gnuplot, having to write derived data to intermediary CSV files. gnuplot itself can also compute stats but I want to move away from gnuplot, not become more dependent.

2 For future reference, the minimum duration of an observation is Math.max(resolutionInSeconds() / 2 / 0.01, 0.05) for a certainty of 99% where resolutionInSeconds() is the average smallest measurable time with process.hrtime().