RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.34k stars 1.26k forks source link

Support for AMD chips in benchmark_tool #18942

Open jwnimmer-tri opened 1 year ago

jwnimmer-tri commented 1 year ago

See https://github.com/RobotLocomotion/drake/issues/17369#issuecomment-1455146760.

A non-trivial number of TRI workstations use AMD chips now, not Intel. We should teach benchmark_tool how to govern the turbo state on those chips as well.

In the meantime, the work-around is either to find an Intel machine, or run the benchmark binary directly instead of through the tool wrapper.

rpoyner-tri commented 1 year ago

@RussTedrake does the machine have this file: /sys/devices/system/cpu/cpufreq/boost? Apparently that is the modern, architecture agnostic kernel knob. Alternatively, /sys/devices/system/cpu/cpufreq/policyX/ (X is CPU ID number) is an older, AMD specific interface.

https://www.kernel.org/doc/html/v5.19/admin-guide/pm/cpufreq.html#frequency-boost-support

Whoever works on fixing this will need administrator access to a relevant machine for testing.

RussTedrake commented 1 year ago

yes. I have both the /sys/devices/system/cpu/cpufreq/boost and the .../policyX directories.

rpoyner-tri commented 1 year ago

Testing thought: the goal of this work is to reduce variance in benchmark results. Does that mean we have to run a bunch of benchmarks with and without boost suppression? Will we be able to see a reduction in variance?

jwnimmer-tri commented 1 year ago

Is there any doubt that a varying clock rate would affect a benchmark? I don't see any need to prove that hypothesis.

I suppose the question is whether the boost suppression code we write is actually suppressing boost (i.e., if we've found the right knobs). That should show up in the mean (-30%), rather than the variance.

rpoyner-tri commented 1 year ago

https://github.com/RobotLocomotion/drake/pull/18964

rpoyner-tri commented 6 months ago

Found another variation, not yet supported. My new Puget (AMD Threadripper) has /sys/devices/system/cpu/cpufreq/policyX/ (seen before but not supported) and /sys/devices/system/cpu/amd_pstate/status. Still investigating what to do with those.