CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27
stars
34
forks
source link
HCDiag: `chk-ib-pcispeed` Test Fails to Detect Mellanox HCAs Speed / Width Values on RHEL 8.4 Node #1017
Describe the bug
HCDiag chk-ib-pcispeed test fails to detect proper speed and width values from the Mellanox adapters on a RHEL 8.4 node.
To Reproduce
Steps to reproduce the behavior:
Run chk-ib-pcispeed HCDiag Test on RHEL 8.4 node
$ hcdiag_run.py --target "r311n09-adm" --test "chk-ib-pcispeed"
INFO: xcat seems to be installed in /opt/xcat/bin. Running in Management mode
Health Check Diagnostics version 1.8.3., running on Linux 4.14.0-115.14.1.el7a.ppc64le, p3xcatmn-adm machine.
Using configuration file /data_local/sw/cast/1.8.3/hcdiag/etc/hcdiag.properties.
Using tests configuration file /data_local/sw/cast/1.8.3/hcdiag/etc/test.properties.
Health Check Diagnostics, run id 211213181602789110, initializing...
Validating command argument test.
Validating command argument target.
Test should fail with the standard error message:
Preparing to run chk-ib-pcispeed.
Executable: /data_local/sw/cast/1.8.3/hcdiag/tests/chk-ib-pcispeed/chk-ib-pcispeed.sh exists on remote node(s).
chk-ib-pcispeed started on 1 node(s) at 2021-12-13 18:16:05.252119. It might take up to 10s.
.
chk-ib-pcispeed ended on 1 node(s) at 2021-12-13 18:16:09.409955, rc= 1, elapsed time: 0:00:04.157836
chk-ib-pcispeed FAIL on node r311n09-adm, serial number: 78875BA, rc= 8. (details in /tmp/211213181602789110/chk-ib-pcispeed/r311n09-adm-2021-12-13-18_16_07.output)
chk-ib-pcispeed.sh test PASS, rc=8
Remote_command_rc = 8
**Expected behavior**
Both speed and width settings should be properly parsed from the `lspci` command output.
**Environment (please complete the following information):**
- RHEL 8.4 Environment
- CAST 1.8.3
**Additional context**
Suggested fix for the issue:
* Original code:
Describe the bug HCDiag
chk-ib-pcispeed
test fails to detect proper speed and width values from the Mellanox adapters on a RHEL 8.4 node.To Reproduce Steps to reproduce the behavior:
chk-ib-pcispeed
HCDiag Test on RHEL 8.4 node=============================== Results summary ===============================
18:16:05 =======================================================================
chk-ib-pcispeed FAIL on 1 node(s):
r311n09-adm
================================================================================
Health Check Diagnostics ended, exit code 100.
Running chk-ib-pcispeed.sh on r311n09, machine type 8335-GTX. Adapter: 0003:01:00.0, 16GT/, Widt. Error, expecting: 16GT/s, got: 16GT/ Error, expecting: x8, got: Widt Adapter: 0003:01:00.1, 16GT/, Widt. Error, expecting: 16GT/s, got: 16GT/ Error, expecting: x8, got: Widt Adapter: 0033:01:00.0, 16GT/, Widt. Error, expecting: 16GT/s, got: 16GT/ Error, expecting: x8, got: Widt Adapter: 0033:01:00.1, 16GT/, Widt. Error, expecting: 16GT/s, got: 16GT/ Error, expecting: x8, got: Widt Found 4 Mellanox adapters.
chk-ib-pcispeed.sh test PASS, rc=8 Remote_command_rc = 8
speed=
echo ${line} | awk '{print substr($3,1,length($3)-1)}'
width=echo ${line} | awk '{print substr($5,1,length($5)-1)}'
speed="$(echo "${line}" | awk 'match($0, /Speed\s([0-9]+GT\/s)/, a) {print a[1]}')" width="$(echo "${line}" | awk 'match($0, /Width\s(x[0-9]+)/, a) {print a[1]}')"