brianfrankcooper / YCSB

Yahoo! Cloud Serving Benchmark
Apache License 2.0
4.95k stars 2.25k forks source link

New YCSB Measurement type #745

Open tiboratAS opened 8 years ago

tiboratAS commented 8 years ago

In the process of running for various NoSQL databases we have added a new measurement type to the Aerospike branch of YCSB. The measurement outputs a histogram of data periodically at a specified interval.

The periodic histogram can be used by setting the measurement type:

measurementtype=periodichistogram

The periodic histogram bucket size or interval can be set by the following setting:

periodichistogram.bucket.interval=0.1

The number of buckets can be set with:

periodichistogram.buckets=1000

The periodic histogram can be use with the hdrhistogram.

measurementtype=hdrhistogram+periodichistogram

The output of the periodic measurement is in the JSON format.

JSON object format:

{ start: “ xxx.x “, end: “ yyy.y“,"opsec":zzz.zz,scale: 0.001, linear: { “0”: x, “1”:y, “5”:z, “7”:m } } { start: “ xxx.x “, end: “ yyy.y“,"opsec":zzz.zz,scale: 0.001, log2: { “0”: x, “1”:y, “4”:z, “8”:m } } { start: “ xxx.x “, end: “ yyy.y“,"opsec":zzz.zz, range: { “0.001:0.002”: x, “0.005:0.008”:y } }

Where start and end are seconds since epoch, with any granularity ( such as milliseconds ) specified as a floating point number. This allows arbitrary granularity.

Where scale is necessary for some data formats. In the linear format, it is the number of seconds in each bucket. Specified as a double ( so 1 millisecond is 0.001 ).

Where the ‘linear’ format has a tag for each bucket, and the buckets are “scale” in size, and buckets with value 0 are not represented.

Where the ‘log2’ defines the starting point, and each element in the subobject is a power of two.

Where the ‘range’ format allows arbitrary ranges to be specified. The values are seconds from the “start” identifier, so there is no “scale”.

Currently the linear format has been tested. In the future we will be adding a log2 and range format for the histograms.

We will be open-sourcing tools for graphing the output.

Having the data periodically output provides a powerful tool for evaluating the YCSB results.

We would like to see the changes merged into the YCSB master.

nitsanw commented 8 years ago

2c: "The measurement outputs a histogram of data periodically at a specified interval." - This is already supported via the HDR histogram log file. Since the logging is loss less you can post process the data to represent precisely periods that are larger than the logging resolution. You can control the hdr logging interval with the status logging interval.

If the data you require exists in the hdr log I'd suggest adding your reporting as a consumer of hdr histogram logs. This would add value as other producers of hdr histogram logs would benefit, and would not bloat YCSB with an extra feature.