geopm / geopm

Global Extensible Open Power Manager
https://geopm.github.io
BSD 3-Clause "New" or "Revised" License
95 stars 48 forks source link

Run nekbone with no markup and compare balancer and governor performance #1179

Closed cmcantalupo closed 4 years ago

cmcantalupo commented 4 years ago

estimate 3

Run first with manual mark up and check that balancer works. After that modify existing nekbone scripts on mcfly to use the edit distance filter and no user mark up of the application. Run twice under a power cap that is a similar fraction of TDP as good experiments we have done on KNL, once with the governor and once with the balancer. Inspect the reports and determine if the balancer was able to provide a benefit with no markup.

Done criteria:

Parent story: #1180

cmcantalupo commented 4 years ago
cmcantal@mcfly:~/output/43973_nekbone]$ grep -A10 Epoch *180*report
nekbone_power_balancer_pwr_180.0_0.report:Epoch Totals:
nekbone_power_balancer_pwr_180.0_0.report-    runtime (sec): 467.834
nekbone_power_balancer_pwr_180.0_0.report-    sync-runtime (sec): 467.829
nekbone_power_balancer_pwr_180.0_0.report-    package-energy (joules): 87144.7
nekbone_power_balancer_pwr_180.0_0.report-    dram-energy (joules): 11294.7
nekbone_power_balancer_pwr_180.0_0.report-    power (watts): 186.275
nekbone_power_balancer_pwr_180.0_0.report-    frequency (%): 93.7494
nekbone_power_balancer_pwr_180.0_0.report-    frequency (Hz): 1.96874e+09
nekbone_power_balancer_pwr_180.0_0.report-    network-time (sec): 75.4461
nekbone_power_balancer_pwr_180.0_0.report-    count: 2000
nekbone_power_balancer_pwr_180.0_0.report-    epoch-runtime-ignore (sec): 0
--
nekbone_power_balancer_pwr_180.0_0.report:Epoch Totals:
nekbone_power_balancer_pwr_180.0_0.report-    runtime (sec): 467.831
nekbone_power_balancer_pwr_180.0_0.report-    sync-runtime (sec): 467.828
nekbone_power_balancer_pwr_180.0_0.report-    package-energy (joules): 80625
nekbone_power_balancer_pwr_180.0_0.report-    dram-energy (joules): 12533.4
nekbone_power_balancer_pwr_180.0_0.report-    power (watts): 172.339
nekbone_power_balancer_pwr_180.0_0.report-    frequency (%): 85.2738
nekbone_power_balancer_pwr_180.0_0.report-    frequency (Hz): 1.79075e+09
nekbone_power_balancer_pwr_180.0_0.report-    network-time (sec): 72.5406
nekbone_power_balancer_pwr_180.0_0.report-    count: 2000
nekbone_power_balancer_pwr_180.0_0.report-    epoch-runtime-ignore (sec): 0
--
nekbone_power_governor_pwr_180_0.report:Epoch Totals:
nekbone_power_governor_pwr_180_0.report-    runtime (sec): 456.508
nekbone_power_governor_pwr_180_0.report-    sync-runtime (sec): 456.503
nekbone_power_governor_pwr_180_0.report-    package-energy (joules): 81931.6
nekbone_power_governor_pwr_180_0.report-    dram-energy (joules): 11257.1
nekbone_power_governor_pwr_180_0.report-    power (watts): 179.476
nekbone_power_governor_pwr_180_0.report-    frequency (%): 89.2044
nekbone_power_governor_pwr_180_0.report-    frequency (Hz): 1.87329e+09
nekbone_power_governor_pwr_180_0.report-    network-time (sec): 54.5483
nekbone_power_governor_pwr_180_0.report-    count: 2000
nekbone_power_governor_pwr_180_0.report-    epoch-runtime-ignore (sec): 0
--
nekbone_power_governor_pwr_180_0.report:Epoch Totals:
nekbone_power_governor_pwr_180_0.report-    runtime (sec): 456.512
nekbone_power_governor_pwr_180_0.report-    sync-runtime (sec): 456.506
nekbone_power_governor_pwr_180_0.report-    package-energy (joules): 81932.1
nekbone_power_governor_pwr_180_0.report-    dram-energy (joules): 12337.2
nekbone_power_governor_pwr_180_0.report-    power (watts): 179.476
nekbone_power_governor_pwr_180_0.report-    frequency (%): 90.3925
nekbone_power_governor_pwr_180_0.report-    frequency (Hz): 1.89824e+09
nekbone_power_governor_pwr_180_0.report-    network-time (sec): 72.1127
nekbone_power_governor_pwr_180_0.report-    count: 2000
nekbone_power_governor_pwr_180_0.report-    epoch-runtime-ignore (sec): 0
cmcantalupo commented 4 years ago

Without balancing power between sockets in a single node, the balancer is not very effective on dual socket Xeon systems for the purpose of mitigating manufacturing variation.