Closed bra-fsn closed 5 months ago
Added new --sync-start option. This required quite a lot of synchronization re-working.
Thanks!
I thought that having synchronised workers will give more consistent results regardless of the test runtime. However, I can see quite the opposite (these are on a 192 core machine):
for i in $(seq 20); do echo -n "$i "; nice -n -20 /tmp/stress-ng --sync-start --metrics --cpu $(nproc) --cpu-method div16 -t $i | awk '/metrc.*cpu/ {print $9" "$11}'; done
1 477869.73 80.43
2 554823.79 93.46
3 520433.70 87.50
4 565221.56 95.04
5 556384.03 93.87
6 566873.13 95.28
7 584442.37 98.21
8 559508.90 94.05
9 586184.74 98.53
10 575939.72 96.80
11 577198.74 97.00
12 588203.51 98.85
13 592025.10 99.51
14 586132.33 98.51
15 577482.38 97.06
16 580788.74 97.62
17 580224.94 97.52
18 589357.12 99.11
19 582089.07 97.84
20 589228.90 99.02
Without the new option:
for i in $(seq 20); do echo -n "$i "; nice -n -20 /tmp/stress-ng --metrics --cpu $(nproc) --cpu-method div16 -t $i | awk '/metrc.*cpu/ {print $9" "$11}'; done
1 585719.85 98.49
2 593026.71 99.68
3 594105.76 99.85
4 594626.59 99.93
5 594701.02 99.94
6 594629.94 99.92
7 594876.16 99.96
8 594485.13 99.92
9 594928.30 99.97
10 594936.09 99.97
11 594941.18 99.98
12 594759.50 99.98
13 594608.77 99.97
14 594314.40 99.97
15 594053.59 99.97
16 594358.35 99.97
17 594617.06 99.98
18 594737.07 99.97
19 594230.50 99.98
20 593840.85 99.94
For reference, this is a full output:
stress-ng: info: [11777] setting to a 20 secs run per stressor
stress-ng: info: [11777] dispatching hogs: 192 cpu
stress-ng: metrc: [11777] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [11777] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [11777] cpu 11900410 20.00 3838.93 0.14 595019.41 3099.82 99.98 1536
stress-ng: info: [11777] skipped: 0
stress-ng: info: [11777] passed: 192: cpu (192)
stress-ng: info: [11777] failed: 0
stress-ng: info: [11777] metrics untrustworthy: 0
stress-ng: info: [11777] successful run completed in 20.07 secs
And this is with 1 core only:
stress-ng: info: [11970] setting to a 20 secs run per stressor
stress-ng: info: [11970] dispatching hogs: 1 cpu
stress-ng: metrc: [11970] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [11970] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [11970] cpu 62030 20.00 20.00 0.00 3101.49 3101.48 100.00 1536
stress-ng: info: [11970] skipped: 0
stress-ng: info: [11970] passed: 1: cpu (1)
stress-ng: info: [11970] failed: 0
stress-ng: info: [11970] metrics untrustworthy: 0
stress-ng: info: [11970] successful run completed in 20.00 secs
What did I get wrong? 🤔
Currently stress-ng parent starts the number of required stressors (specified with the
--cpu X
option) and those processes immediately start working. This means that on a machine with a large number of CPUs will have an imbalanced load: there will be an interval at the start of the tests where not all stressors are running (some of them are just starting up, others have already started to work, others are just yet to be started), then at the end the opposite happens: some stressors have already finished and some of them still work.I propose a different option: