Add multichase as a benchmark

voellm commented 8 years ago

Should we add multichase as a benchmark for PKB? Its pullls out a HPL RandonAccess like workload and is simpler to run.

https://github.com/google/multichase

rama-govindaraju commented 8 years ago

Makes sense. +1 for adding this to PKB.

meteorfox commented 8 years ago

Seems similar to the latency benchmark (lat_mem_rd) in LMBench, but it seems this one is multi-core 'aware'. :+1: +1

May I also suggest that before we just publish the metrics that the multichase outputs, that we decide what are the metrics we need.

Personally,

From the top of my head, I would like to see metrics for these, I think multichase can provide some of them.

Multiple percentiles, not just avg & sdev or best latency, for every scenario, because I expect memory access latencies to be distributed in a multi-modal fashion with a long tail due to the memory cache hierarchy and cache interference. Ideally, I want to be able to generate histograms or CDF charts with these metrics.
Measure with different strides and a fixed memory size (single threaded, and multi-threaded)
- stride < cache line size
- stride = cache line size
- stride > cache line size
- stride = page size
- stride > page size
Measure with different memory sizes and fixed stride (single threaded, and multi-threaded)
- Double memory size for every sample until 85% of total memory size
- If avg latency is plotted using lg x-axis for the memory size, should show the different 'plateaus' of the cache hierarchy. Something like this:
Measure latency of de-referencing modified cache lines between cores. (Emulates false sharing effects and locks)

voellm commented 8 years ago

Very good points!

On Mon, Feb 22, 2016 at 9:58 AM, Carlos Torres notifications@github.com wrote:

Seems similar to the latency benchmark (lat_mem_rd http://www.bitmover.com/lmbench/lat_mem_rd.8.html) in LMBench, but it seems this one is multi-core 'aware'. [image: :+1:] +1

May I also suggest that before we just publish the metrics that the multichase outputs, that we decide what are the metrics we need.

Personally,

From the top of my head, I would like to see metrics for these, I think multichase can provide some of them.

Multiple percentiles, not just avg & sdev or best latency, for every scenario, because I expect memory access latencies to be distributed in a multi-modal fashion with a long tail due to the memory cache hierarchy and cache interference. Ideally, I want to be able to generate histograms or CDF charts with these metrics.

Measure with different strides and a fixed memory size (single threaded, and multi-threaded)

stride < cache line size

stride = cache line size

stride > cache line size

stride = page size

stride > page size

Measure with different memory sizes and fixed stride (single threaded, and multi-threaded)

Double memory size for every sample until 85% of total memory size

If avg latency is plotted using lg x-axis for the memory size, should show the different 'plateaus' of the cache hierarchy. Something like this: [image: mem_latency_chart] https://cloud.githubusercontent.com/assets/232300/13226545/75771cde-d957-11e5-8efe-106b93a6cd9d.png

Measure latency of de-referencing modified cache lines between cores. (Emulates false sharing effects and locks)

— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/issues/848#issuecomment-187294788 .

Anthony F. Voellm (aka Tony) Google Voice: (650) 516-7382 https://www.google.com/voice/b/0?pli=1#phones Blog: http://perfguy.blogspot.com/ Benchmark like a pro... PerfKitBenchmarker https://github.com/GoogleCloudPlatform/PerfKitBenchmarker

meteorfox commented 8 years ago

Ok, so I forked multichase benchmark, and I've been experimenting with using HdrHistogram to track the statistics.

Currently, multichase starts the threads executing the workload in a loop as fast as possible and every 0.5 seconds it samples each thread's counter which keeps track of the number of iterations executed between the sampling period. Then it reports the average latency as number of iterations / time_delta and time_delta is roughly 0.5 seconds.

The modification I did instead uses a High-dynamic range histogram (HdrHistogram) to efficiently track 'running' percentiles and other aggregate statistics. The measurements are taken at the thread level after every loop. Then each thread's histogram is aggregated to a final histogram that shows the combined statistics.

It's only a POC, I not a fan of the approach I used to protect the histogram with the spin lock but, at least, it doesn't seem to add that much overhead.

The PR is here https://github.com/meteorfox/multichase/pull/1

Here's the sample output (Values column is in nanoseconds):

What's really cool about this (assuming I don't have a stupid mistake), is you can see the long tail of the distribution, and there are some measurements in the order of microseconds range which could likely be context switching effects.

carl6796@t450s:~/programming/github.com/google/multichase$ ./multichase -a -n 10
66.501   <=== Average value computed when using -a flag
       Value   Percentile   TotalCount 1/(1-Percentile)

      51.999     0.000000            3         1.00
      59.519     0.100000       250214         1.11
      61.183     0.200000       501469         1.25
      62.431     0.300000       752337         1.43
      63.487     0.400000       997071         1.67
      64.511     0.500000      1251315         2.00
      65.023     0.550000      1375088         2.22
      65.535     0.600000      1493436         2.50
      66.111     0.650000      1617645         2.86
      66.751     0.700000      1742183         3.33
      67.519     0.750000      1872245         4.00
      67.903     0.775000      1930744         4.44
      68.351     0.800000      1993236         5.00
      68.863     0.825000      2055409         5.71
      69.439     0.850000      2117415         6.67
      70.079     0.875000      2178694         8.00
      70.463     0.887500      2210113         8.89
      70.911     0.900000      2243169        10.00
      71.359     0.912500      2272272        11.43
      71.871     0.925000      2301316        13.33
      72.575     0.937500      2333351        16.00
      72.959     0.943750      2347802        17.78
      73.407     0.950000      2363609        20.00
      73.919     0.956250      2379125        22.86
      74.559     0.962500      2394505        26.67
      75.391     0.968750      2409715        32.00
      75.967     0.971875      2417452        35.56
      76.735     0.975000      2425184        40.00
      77.759     0.978125      2433297        45.71
      79.039     0.981250      2440808        53.33
      81.279     0.984375      2448525        64.00
      83.455     0.985938      2452406        71.11
      86.911     0.987500      2456252        80.00
      93.887     0.989062      2460127        91.43
     101.695     0.990625      2463999       106.67
     107.263     0.992188      2467889       128.00
     110.079     0.992969      2469887       142.22
     112.639     0.993750      2471823       160.00
     115.135     0.994531      2473769       182.86
     117.695     0.995313      2475673       213.33
     122.175     0.996094      2477611       256.00
     125.503     0.996484      2478579       284.44
     131.583     0.996875      2479554       320.00
     138.111     0.997266      2480532       365.71
     145.791     0.997656      2481499       426.67
     155.391     0.998047      2482466       512.00
     160.511     0.998242      2482947       568.89
     166.783     0.998437      2483441       640.00
     174.591     0.998633      2483922       731.43
     183.807     0.998828      2484409       853.33
     193.791     0.999023      2484896      1024.00
     198.399     0.999121      2485147      1137.78
     202.367     0.999219      2485380      1280.00
     207.999     0.999316      2485627      1462.86
     214.655     0.999414      2485863      1706.67
     223.871     0.999512      2486112      2048.00
     227.071     0.999561      2486235      2275.56
     229.759     0.999609      2486352      2560.00
     234.495     0.999658      2486478      2925.71
     239.615     0.999707      2486590      3413.33
     243.199     0.999756      2486716      4096.00
     245.119     0.999780      2486771      4551.11
     248.319     0.999805      2486835      5120.00
     250.751     0.999829      2486900      5851.43
     254.463     0.999854      2486954      6826.67
     257.279     0.999878      2487016      8192.00
     259.839     0.999890      2487049      9102.22
     260.863     0.999902      2487078     10240.00
     262.911     0.999915      2487106     11702.86
     268.543     0.999927      2487139     13653.33
     271.615     0.999939      2487168     16384.00
     275.199     0.999945      2487181     18204.44
     278.783     0.999951      2487198     20480.00
     286.975     0.999957      2487216     23405.71
     290.559     0.999963      2487226     27306.67
     306.943     0.999969      2487246     32768.00
     314.111     0.999973      2487255     36408.89
     322.815     0.999976      2487263     40960.00
     329.215     0.999979      2487269     46811.43
     344.063     0.999982      2487279     54613.33
     356.863     0.999985      2487286     65536.00
     356.863     0.999986      2487286     72817.78
     390.143     0.999988      2487288     81920.00
     420.351     0.999989      2487293     93622.86
    4313.087     0.999991      2487299    109226.67
    4313.087     0.999992      2487299    131072.00
    5885.951     0.999993      2487307    145635.56
    5885.951     0.999994      2487307    163840.00
    5885.951     0.999995      2487307    187245.71
    5885.951     0.999995      2487307    218453.33
   23511.039     0.999996      2487317    262144.00
   23511.039     1.000000      2487317          inf
#[Mean    =       65.658, StdDeviation   =       49.416]
#[Max     =    23511.039, Total count    =      2487317]
#[Buckets =           20, SubBuckets     =         2048]

Here's a graphical representation of the same output: histogram 1

GoogleCloudPlatform / PerfKitBenchmarker

Add multichase as a benchmark #848