Benchmark the effect of slice width on performance

I was talking to @linhvo about this and wanted to capture some thoughts here. @jaffee please feel free to chime in.

Some recent discussions have shown a need for understanding performance in terms of three variables: slice width, slice count, and data density. I'm interested in comparing all three of these at once, so I'd like to collect performance data in a "parameter grid". For example, we could benchmark the performance of a big read query on all combinations of:

sliceWidth in {2^12, 2^16, 2^20, 2^22}
averageBitDensity in {.1, .01, .001, .0001}
numSlices in {1, 5, 10, 50, 100}

Varying sliceWidth directly with the benchmark tools is blocked right now, but we should be able to build a benchmark that can handle the other two parameters easily. We can then re-run that single benchmark on a series of servers, for the sake of getting this data sooner.

For simplicity, we can start with, for example,

a single frame with 100 rows of the same bit density
uniformly distributed data with the exact same bit count per slice per bitmap
a simple density definition: bitDensity=.1 means .1*sliceWidth bits are set in a given slice of one bitmap. (~there are other options, I want to read something like this and see how relevant it is~) For binary data there aren't that many options; the Gini coefficient recommended by this is equivalent to this simple definition.
for the read query, a topN with a big nested bitmap

With the result data, we could produce benchmark tables/graphs with different views, for example query time vs number of slices, with a different line plot for different slice widths.

FeatureBaseDB / tools

Benchmark the effect of slice width on performance #35