Open jwnimmer-tri opened 1 year ago
@RussTedrake does the machine have this file: /sys/devices/system/cpu/cpufreq/boost
? Apparently that is the modern, architecture agnostic kernel knob. Alternatively, /sys/devices/system/cpu/cpufreq/policyX/
(X is CPU ID number) is an older, AMD specific interface.
https://www.kernel.org/doc/html/v5.19/admin-guide/pm/cpufreq.html#frequency-boost-support
Whoever works on fixing this will need administrator access to a relevant machine for testing.
yes. I have both the /sys/devices/system/cpu/cpufreq/boost
and the .../policyX
directories.
Testing thought: the goal of this work is to reduce variance in benchmark results. Does that mean we have to run a bunch of benchmarks with and without boost suppression? Will we be able to see a reduction in variance?
Is there any doubt that a varying clock rate would affect a benchmark? I don't see any need to prove that hypothesis.
I suppose the question is whether the boost suppression code we write is actually suppressing boost (i.e., if we've found the right knobs). That should show up in the mean (-30%), rather than the variance.
Found another variation, not yet supported. My new Puget (AMD Threadripper) has /sys/devices/system/cpu/cpufreq/policyX/
(seen before but not supported) and /sys/devices/system/cpu/amd_pstate/status
. Still investigating what to do with those.
See https://github.com/RobotLocomotion/drake/issues/17369#issuecomment-1455146760.
A non-trivial number of TRI workstations use AMD chips now, not Intel. We should teach
benchmark_tool
how to govern the turbo state on those chips as well.In the meantime, the work-around is either to find an Intel machine, or run the benchmark binary directly instead of through the tool wrapper.