Closed da-ekchajzer closed 1 year ago
Hello @da-ekchajzer, we used Stress-ng on our side and we found it pretty flexible to simulate different kinds of workloads and most importantly target a specific CPU load which isn't always easy with other stress tools. To reflect an average usage we tested several options.
To be able to run on many platforms it should be coupled with power measurement (software based of physical wattmeters). By default what was missing on our side for CPU was the support of AMD machines but that's easily doable (without memory consumption however for AMD). The same goes for GPUs if looking at ML-oriented hardware (Nvidia and AMD provide tooling for this).
Thank you for the resource.
About power measurement, I'll let you give your opinion or advice here : https://github.com/Boavizta/Energizta/issues/2
See https://github.com/teads/turbostress for an implementation of stress-ng in an analog context.
I think we should provide a complete example based on stress-ng, but the list of stress tests could be left to the user. We should just load a list of commands from a file.
Something like could work fine :
#!/bin/bash
# We don't want to leave a stress test running after this script
trap 'kill $(jobs -p)' EXIT
INTERVAL=5
WARMUP=20
TIMEOUT=60
# In the final version, this list could be load from a config file
stresstests="""
stress-ng --cpu 1
stress-ng --cpu 4
stress-ng --cpu-load 100
"""
echo "$stresstests" | while IFS= read -r stresstest ; do
if [ -n "$stresstest" ]; then
echo "Running $stresstest"
$stresstest > /dev/null 2>&1 &
testpid=$!
i=0
while [ $((INTERVAL * i)) -lt "$TIMEOUT" ]; do
if [ $((INTERVAL * i)) -gt "$WARMUP" ]; then
uptime # Here we call the complete "get_states" function to get power consumption and various metrics.
fi
sleep $INTERVAL
i=$((i+1))
done
kill $testpid
fi
done
Hey there !
Just had a great discussion with Arne from Green Coding. They have very valuable insights, and also a tool for this, includng very important variables like hyper threading, turboboost, etc.. We should synchronize before moving on a direction by ourselves !
We should have a look to https://github.com/green-coding-berlin/tools and synchronize with them
I guess we should have an interface to easily implement different types of tests.
Now that we have decided to use stress-ng. I think we should continue the discussion here #19
Problem
We need to automate the collection of point-in-time power consumption measurements per component at different workload levels.
Example for CPU intel xeon platinium
We need to control the level of workload of each components involve in the evaluation.
Solution
I propose to use a stress module to control the workload level. We can start with a step of 10 : 0%, 10%, 20%, ..., 100%.
@github-benjamin-davy, you have already developed this kind of module. Would you have some advice / resources that could be useful ?
Article : https://medium.com/teads-engineering/estimating-aws-ec2-instances-power-consumption-c9745e347959