Open voellm opened 8 years ago
Makes sense. +1 for adding this to PKB.
Seems similar to the latency benchmark (lat_mem_rd
) in LMBench, but it seems this one is multi-core 'aware'.
:+1: +1
May I also suggest that before we just publish the metrics that the multichase
outputs, that we decide what are the metrics we need.
Personally,
From the top of my head, I would like to see metrics for these, I think multichase can provide some of them.
Very good points!
On Mon, Feb 22, 2016 at 9:58 AM, Carlos Torres notifications@github.com wrote:
Seems similar to the latency benchmark (lat_mem_rd http://www.bitmover.com/lmbench/lat_mem_rd.8.html) in LMBench, but it seems this one is multi-core 'aware'. [image: :+1:] +1
May I also suggest that before we just publish the metrics that the multichase outputs, that we decide what are the metrics we need.
Personally,
From the top of my head, I would like to see metrics for these, I think multichase can provide some of them.
- Multiple percentiles, not just avg & sdev or best latency, for every scenario, because I expect memory access latencies to be distributed in a multi-modal fashion with a long tail due to the memory cache hierarchy and cache interference. Ideally, I want to be able to generate histograms or CDF charts with these metrics.
- Measure with different strides and a fixed memory size (single threaded, and multi-threaded)
- stride < cache line size
- stride = cache line size
- stride > cache line size
- stride = page size
- stride > page size
- Measure with different memory sizes and fixed stride (single threaded, and multi-threaded)
- Double memory size for every sample until 85% of total memory size
- If avg latency is plotted using lg x-axis for the memory size, should show the different 'plateaus' of the cache hierarchy. Something like this: [image: mem_latency_chart] https://cloud.githubusercontent.com/assets/232300/13226545/75771cde-d957-11e5-8efe-106b93a6cd9d.png
- Measure latency of de-referencing modified cache lines between cores. (Emulates false sharing effects and locks)
— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/issues/848#issuecomment-187294788 .
Anthony F. Voellm (aka Tony) Google Voice: (650) 516-7382 https://www.google.com/voice/b/0?pli=1#phones Blog: http://perfguy.blogspot.com/ Benchmark like a pro... PerfKitBenchmarker https://github.com/GoogleCloudPlatform/PerfKitBenchmarker
Ok, so I forked multichase benchmark, and I've been experimenting with using HdrHistogram to track the statistics.
Currently, multichase starts the threads executing the workload in a loop as fast as possible and every 0.5 seconds it samples each thread's counter which keeps track of the number of iterations executed between the sampling period. Then it reports the average latency as number of iterations / time_delta
and time_delta
is roughly 0.5 seconds.
The modification I did instead uses a High-dynamic range histogram (HdrHistogram) to efficiently track 'running' percentiles and other aggregate statistics. The measurements are taken at the thread level after every loop. Then each thread's histogram is aggregated to a final histogram that shows the combined statistics.
It's only a POC, I not a fan of the approach I used to protect the histogram with the spin lock but, at least, it doesn't seem to add that much overhead.
The PR is here https://github.com/meteorfox/multichase/pull/1
Here's the sample output (Values column is in nanoseconds):
What's really cool about this (assuming I don't have a stupid mistake), is you can see the long tail of the distribution, and there are some measurements in the order of microseconds range which could likely be context switching effects.
carl6796@t450s:~/programming/github.com/google/multichase$ ./multichase -a -n 10
66.501 <=== Average value computed when using -a flag
Value Percentile TotalCount 1/(1-Percentile)
51.999 0.000000 3 1.00
59.519 0.100000 250214 1.11
61.183 0.200000 501469 1.25
62.431 0.300000 752337 1.43
63.487 0.400000 997071 1.67
64.511 0.500000 1251315 2.00
65.023 0.550000 1375088 2.22
65.535 0.600000 1493436 2.50
66.111 0.650000 1617645 2.86
66.751 0.700000 1742183 3.33
67.519 0.750000 1872245 4.00
67.903 0.775000 1930744 4.44
68.351 0.800000 1993236 5.00
68.863 0.825000 2055409 5.71
69.439 0.850000 2117415 6.67
70.079 0.875000 2178694 8.00
70.463 0.887500 2210113 8.89
70.911 0.900000 2243169 10.00
71.359 0.912500 2272272 11.43
71.871 0.925000 2301316 13.33
72.575 0.937500 2333351 16.00
72.959 0.943750 2347802 17.78
73.407 0.950000 2363609 20.00
73.919 0.956250 2379125 22.86
74.559 0.962500 2394505 26.67
75.391 0.968750 2409715 32.00
75.967 0.971875 2417452 35.56
76.735 0.975000 2425184 40.00
77.759 0.978125 2433297 45.71
79.039 0.981250 2440808 53.33
81.279 0.984375 2448525 64.00
83.455 0.985938 2452406 71.11
86.911 0.987500 2456252 80.00
93.887 0.989062 2460127 91.43
101.695 0.990625 2463999 106.67
107.263 0.992188 2467889 128.00
110.079 0.992969 2469887 142.22
112.639 0.993750 2471823 160.00
115.135 0.994531 2473769 182.86
117.695 0.995313 2475673 213.33
122.175 0.996094 2477611 256.00
125.503 0.996484 2478579 284.44
131.583 0.996875 2479554 320.00
138.111 0.997266 2480532 365.71
145.791 0.997656 2481499 426.67
155.391 0.998047 2482466 512.00
160.511 0.998242 2482947 568.89
166.783 0.998437 2483441 640.00
174.591 0.998633 2483922 731.43
183.807 0.998828 2484409 853.33
193.791 0.999023 2484896 1024.00
198.399 0.999121 2485147 1137.78
202.367 0.999219 2485380 1280.00
207.999 0.999316 2485627 1462.86
214.655 0.999414 2485863 1706.67
223.871 0.999512 2486112 2048.00
227.071 0.999561 2486235 2275.56
229.759 0.999609 2486352 2560.00
234.495 0.999658 2486478 2925.71
239.615 0.999707 2486590 3413.33
243.199 0.999756 2486716 4096.00
245.119 0.999780 2486771 4551.11
248.319 0.999805 2486835 5120.00
250.751 0.999829 2486900 5851.43
254.463 0.999854 2486954 6826.67
257.279 0.999878 2487016 8192.00
259.839 0.999890 2487049 9102.22
260.863 0.999902 2487078 10240.00
262.911 0.999915 2487106 11702.86
268.543 0.999927 2487139 13653.33
271.615 0.999939 2487168 16384.00
275.199 0.999945 2487181 18204.44
278.783 0.999951 2487198 20480.00
286.975 0.999957 2487216 23405.71
290.559 0.999963 2487226 27306.67
306.943 0.999969 2487246 32768.00
314.111 0.999973 2487255 36408.89
322.815 0.999976 2487263 40960.00
329.215 0.999979 2487269 46811.43
344.063 0.999982 2487279 54613.33
356.863 0.999985 2487286 65536.00
356.863 0.999986 2487286 72817.78
390.143 0.999988 2487288 81920.00
420.351 0.999989 2487293 93622.86
4313.087 0.999991 2487299 109226.67
4313.087 0.999992 2487299 131072.00
5885.951 0.999993 2487307 145635.56
5885.951 0.999994 2487307 163840.00
5885.951 0.999995 2487307 187245.71
5885.951 0.999995 2487307 218453.33
23511.039 0.999996 2487317 262144.00
23511.039 1.000000 2487317 inf
#[Mean = 65.658, StdDeviation = 49.416]
#[Max = 23511.039, Total count = 2487317]
#[Buckets = 20, SubBuckets = 2048]
Here's a graphical representation of the same output:
Should we add multichase as a benchmark for PKB? Its pullls out a HPL RandonAccess like workload and is simpler to run.
https://github.com/google/multichase