Boavizta / Energizta

:electric_plug: A collaborative project to collect and report open-data on the energy consumption of servers.
https://boavizta.github.io/Energizta/
35 stars 4 forks source link

stress_test: Control the level of workload of each components #1

Closed da-ekchajzer closed 1 year ago

da-ekchajzer commented 2 years ago

Problem

We need to automate the collection of point-in-time power consumption measurements per component at different workload levels.

Example for CPU intel xeon platinium

168419401-f07653f2-066f-43b4-a3a9-340636a00c9a

We need to control the level of workload of each components involve in the evaluation.

Solution

I propose to use a stress module to control the workload level. We can start with a step of 10 : 0%, 10%, 20%, ..., 100%.

@github-benjamin-davy, you have already developed this kind of module. Would you have some advice / resources that could be useful ?

Article : https://medium.com/teads-engineering/estimating-aws-ec2-instances-power-consumption-c9745e347959

github-benjamin-davy commented 2 years ago

Hello @da-ekchajzer, we used Stress-ng on our side and we found it pretty flexible to simulate different kinds of workloads and most importantly target a specific CPU load which isn't always easy with other stress tools. To reflect an average usage we tested several options.

To be able to run on many platforms it should be coupled with power measurement (software based of physical wattmeters). By default what was missing on our side for CPU was the support of AMD machines but that's easily doable (without memory consumption however for AMD). The same goes for GPUs if looking at ML-oriented hardware (Nvidia and AMD provide tooling for this).

da-ekchajzer commented 2 years ago

Thank you for the resource.

About power measurement, I'll let you give your opinion or advice here : https://github.com/Boavizta/Energizta/issues/2

da-ekchajzer commented 2 years ago

To-do

da-ekchajzer commented 2 years ago

See https://github.com/teads/turbostress for an implementation of stress-ng in an analog context.

maethor commented 2 years ago

I think we should provide a complete example based on stress-ng, but the list of stress tests could be left to the user. We should just load a list of commands from a file.

Something like could work fine :

#!/bin/bash

# We don't want to leave a stress test running after this script
trap 'kill $(jobs -p)' EXIT

INTERVAL=5
WARMUP=20
TIMEOUT=60

# In the final version, this list could be load from a config file
stresstests="""
stress-ng --cpu 1
stress-ng --cpu 4
stress-ng --cpu-load 100
"""

echo "$stresstests" | while IFS= read -r stresstest ; do
   if [ -n "$stresstest" ]; then
       echo "Running $stresstest"
       $stresstest > /dev/null 2>&1 &
       testpid=$!

       i=0
       while [ $((INTERVAL * i)) -lt "$TIMEOUT" ]; do
           if [ $((INTERVAL * i)) -gt "$WARMUP" ]; then
               uptime # Here we call the complete "get_states" function to get power consumption and various metrics.
           fi
           sleep $INTERVAL
           i=$((i+1))
       done

       kill $testpid
   fi
done
bpetit commented 2 years ago

Hey there !

Just had a great discussion with Arne from Green Coding. They have very valuable insights, and also a tool for this, includng very important variables like hyper threading, turboboost, etc.. We should synchronize before moving on a direction by ourselves !

We should have a look to https://github.com/green-coding-berlin/tools and synchronize with them

da-ekchajzer commented 2 years ago

I guess we should have an interface to easily implement different types of tests.

maethor commented 1 year ago

Now that we have decided to use stress-ng. I think we should continue the discussion here #19