amd / amd_energy

36 stars 16 forks source link

Same package energy reading with dual socket #5

Open NiccoloTosato opened 11 months ago

NiccoloTosato commented 11 months ago

Problem: Under certain conditions, the readings related to package energy are identical.

The conditions that lead to this issue are:

Test example:

When utilizing a dual-socket machine equipped with EPYC 7H12 CPUs, configured with 4 NUMA regions per socket, the provided code snippet will designate the scpu as number 16 when attempting to read the package energy from the second socket (sock=1).

scpu = cpumask_first_and(cpu_online_mask,
                         cpumask_of_node(sock));

The assumption of one NUMA node per socket is incorrect. I recommend the following correction:

scpu = cpumask_first_and(cpu_online_mask,
                         topology_die_cpumask
                         ((data->nr_cpus/data->nr_socks) * sock));
sumachidanand commented 6 months ago

Hi NiccoloTosato,

Thank you for your findings on energy reading on multiple NUMA nodes per socket system. Have you tested your code snippet? Can you share the logs capturing energy readings on different sockets?

Thank you,

NiccoloTosato commented 5 months ago

Yes, for sure.

The following script, measures for each socket the energy, then sleeps for 5 seconds and read again the energy, repeat for 10 times and calculate the delta between the energy consumed for each socket.

debug.sh:

#!/usr/bin/bash
AVERAGE=0
for i in {1..10}
do
    S0_init=$(cat /sys/class/hwmon/hwmon2/energy129_input)
    S1_init=$(cat /sys/class/hwmon/hwmon2/energy130_input)

    sleep 5

    S0_end=$(cat /sys/class/hwmon/hwmon2/energy129_input)
    S1_end=$(cat /sys/class/hwmon/hwmon2/energy130_input)

    S0_delta=$((S0_end-S0_init))
    S1_delta=$((S1_end-S1_init))

    DELTA=$(( S1_delta - S0_delta ))
    echo "Delta reading: $DELTA"
    AVERAGE=$(( AVERAGE + DELTA ))

done

#print a sample of reading
echo "Initial reading: $(cat /sys/class/hwmon/hwmon2/energy129_label) ${S0_init}"
echo "Initial reading: $(cat /sys/class/hwmon/hwmon2/energy130_label) ${S1_init}"

echo "Final reading: $(cat /sys/class/hwmon/hwmon2/energy129_label) ${S0_end}"
echo "Final reading: $(cat /sys/class/hwmon/hwmon2/energy130_label) ${S1_end}"

echo "Average Energy difference S1-S0 $( echo "$AVERAGE / 10" | bc )"

With the original version, we get an average difference near to zero, we can see also from the initial and final reading that the values are more or less identical.

$ ./debug.sh
Delta reading: 1099
Delta reading: 244
Delta reading: -1968
Delta reading: -2716
Delta reading: -1403
Delta reading: -534
Delta reading: -504
Delta reading: -15
Delta reading: -550
Delta reading: -3661
Initial reading: Esocket0 95075230804
Initial reading: Esocket1 95075321975
Final reading: Esocket0 95523233016
Final reading: Esocket1 95523320526
Average Energy difference S1-S0 -1000

With the fixed version the reading are more reasonable:

$ ./debug.sh
Delta reading: 1720962
Delta reading: 1759064
Delta reading: 1642212
Delta reading: 1669174
Delta reading: 2078536
Delta reading: 1749863
Delta reading: 1644867
Delta reading: 1445480
Delta reading: 1664337
Delta reading: 1772965
Initial reading: Esocket0 36358811248
Initial reading: Esocket1 54468877883
Final reading: Esocket0 36806677627
Final reading: Esocket1 54918517227
Average Energy difference S1-S0 1714746

Moreover, I provide the following debug ouput from dmesg:

Editing the function amd_create_sensor in order to print the core related to each socket:

$ diff amd_energy.c  ../amd_energy/amd_energy.c
264,279c263,267
<     s_config[i] = config;
<     if (i < cpus) {
<       scnprintf(label_l[i], 10, "Ecore%03u", i);
<     }
<     else {
<       scnprintf(label_l[i], 10, "Esocket%u", (i - cpus));
<       int core_id,core_id_fix;
<       core_id = cpumask_first_and(cpu_online_mask,
<                   cpumask_of_node(i - data->nr_cpus));
<               
<       core_id_fix=cpumask_first_and(cpu_online_mask,
<                       topology_die_cpumask((data->nr_cpus/data->nr_socks) * (i - data->nr_cpus) ));
<       printk("DEBUG: Socket %d, core =%d\n",(i - cpus),core_id);
<       printk("DEBUG: Socket %d, core_fixed=%d\n",(i - cpus),core_id_fix);
<     }
<       
---

We get:

[596248.696402] DEBUG: Socket 0, core =0
[596248.696407] DEBUG: Socket 0, core_fixed=0
[596248.696409] DEBUG: Socket 1, core =16
[596248.696410] DEBUG: Socket 1, core_fixed=64

Tested on 2 EPYC 7H12 configured with 4 numa node each (NPS4).

sumachidanand commented 5 months ago

Thanks NiccoloTosato for sharing the detailed analysis. I will try out this script and core id printing at my end.

sumachidanand commented 4 months ago

Hi NiccoloTosato,

We have also observed the same package energy reading issue in the energy driver when multiple numa nodes are enabled and with the fix you provided, issue is not seen. We have accepted and pulled the fix you provided into our energy driver. We would like to thank you for your findings and the solution.