canonical / checkbox

Checkbox is a testing framework used to validate device compatibility with Ubuntu Linux. It’s the testing tool developed for the purposes of the Ubuntu Certification program.
https://checkbox.readthedocs.io
GNU General Public License v3.0
34 stars 50 forks source link

Support NUMA aware disk I/O performance #552

Open tai271828 opened 1 year ago

tai271828 commented 1 year ago

Enhancement Proposal

In order to ensure optimal disk I/O performance on machines that support Non-Uniform Memory Access (NUMA), it's vital that we bind the CPU and memory usage of the benchmark process to the same NUMA node. For instance, certain systems like the 2P Altra and Altra Max have been observed to exhibit issues with PCIe ordering when inter-socket transfers are in operation. This results in potentially unstable performance (sometimes could be very low to fail the test) if this binding isn't correctly enforced.

It's worth noting that a related approach has already been successfully employed to improve network performance, as illustrated in commit 11c047ba955488ee8f394db41c04b84cfc231632. We propose a similar implementation to enhance the stability of disk I/O performance on NUMA-supporting machines.

kissiel commented 1 year ago

So you want to set cpu affinity for the disk benchmark tests?

tai271828 commented 1 year ago

So you want to set cpu affinity for the disk benchmark tests?

Yes, exactly. Setting CPU affinity for disk benchmark tests is one part of the proposed solution, but it's a little bit more nuanced.

The key objective here is to make sure that both the CPU usage and memory usage of the benchmark process are bound to the same Non-Uniform Memory Access (NUMA) node. This will help prevent issues we've observed on certain systems that cause performance instability due to PCIe ordering problems when inter-socket transfers are happening.

In addition to setting CPU affinity, we also need to ensure the memory used by the benchmark process is allocated on the same NUMA node to guarantee optimal disk I/O performance. This approach builds on the proven success of a similar implementation we've used for enhancing network performance.

Possible Implementation

I have verified a patch of prototype like this could make the result of performance testing for Ampere Altra/AltraMax SoC stable:

[^_^]─[~/work/checkbox-project/providers/plainbox-provider-checkbox] tai271828@syakaro: 230411-15:36:33 9 file 100Kb
$ git diff
diff --git a/bin/disk_read_performance_test.sh b/bin/disk_read_performance_test.sh
index f416466..39872e5 100755
--- a/bin/disk_read_performance_test.sh
+++ b/bin/disk_read_performance_test.sh
@@ -81,7 +81,7 @@ for disk in "$@"; do
   echo "---------------------------------------------------"

   for iteration in $(seq 1 10); do
-    speed=$(hdparm -t /dev/"$disk" 2>/dev/null | grep "Timing buffered disk reads" | awk -F"=" '{print $2}' | awk '{print $1}')
+    speed=$(numactl -m 0 -N 0 hdparm -t /dev/"$disk" 2>/dev/null | grep "Timing buffered disk reads" | awk -F"=" '{print $2}' | awk '{print $1}')
     echo "INFO: Iteration $iteration: Detected speed is $speed MB/sec"

     if [ -z "$speed" ]; then

However, we won't like the hard-coded values for -m and -N as the values vary from machines to machines according to its NUMA support.

In commit https://github.com/canonical/checkbox/commit/11c047ba955488ee8f394db41c04b84cfc231632 , function like find_numa and find_cores seem what we can re-use or translate into the job of disk performance.

I believe that we don't need the implementation like https://github.com/canonical/checkbox/blob/main/providers/base/bin/cpufreq_test.py since numactl may take care of the affinity already (for example, see https://github.com/numactl/numactl/blob/master/affinity.c ).