IBM / CAST

CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27 stars 34 forks source link

Fixed hcdiag chk-ib-pcispeed test for RHEL 8.4 #1017 #1022

Closed besawn closed 2 years ago

besawn commented 2 years ago

This PR contains a fix and testcase enhancement for #1017

I was able to reproduce the problem described in the original issue. I could not reproduce the problem on RHEL 7.6-Alt, it was introduced somewhere between RHEL 7.6-Alt and RHEL 8.4. I validated that the fix works successfully on both RHEL 7.6-Alt and RHEL 8.4.

Unit test of chk-ib-pcispeed on RHEL 8.4, before the fix (failure case):

[root@c650f99p06 ~]# xdsh c650f99p26 "cat /etc/redhat-release"
c650f99p26: Red Hat Enterprise Linux release 8.4 (Ootpa)

[root@c650f99p06 ~]# /opt/ibm/csm/hcdiag/bin/hcdiag_run.py --target "c650f99p26" --test "chk-ib-pcispeed" 2>&1 | grep chk-ib-pcispeed
Preparing to run chk-ib-pcispeed.
Executable: /opt/ibm/csm/hcdiag/tests/chk-ib-pcispeed/chk-ib-pcispeed.sh exists on remote node(s).
chk-ib-pcispeed started on 1 node(s) at 2022-02-17 09:53:10.167208. It might take up to 10s.
chk-ib-pcispeed ended on 1 node(s) at 2022-02-17 09:53:12.409103, rc= 1, elapsed time: 0:00:02.241895
chk-ib-pcispeed FAIL on node c650f99p26, serial number: 787C48A, rc= 8. (details in /tmp/220217095307659403/chk-ib-pcispeed/c650f99p26-2022-02-17-09_53_11.output)

Unit test of chk-ib-pcispeed on RHEL 8.4, after the fix (verifies the fix):

[root@c650f99p06 ~]# xdsh c650f99p26 "cat /etc/redhat-release"
c650f99p26: Red Hat Enterprise Linux release 8.4 (Ootpa)

[root@c650f99p06 ~]# /opt/ibm/csm/hcdiag/bin/hcdiag_run.py --target "c650f99p26" --test "chk-ib-pcispeed" 2>&1 | grep chk-ib-pcispeed
Preparing to run chk-ib-pcispeed.
Executable: /opt/ibm/csm/hcdiag/tests/chk-ib-pcispeed/chk-ib-pcispeed.sh exists on remote node(s).
chk-ib-pcispeed started on 1 node(s) at 2022-02-17 15:08:29.195167. It might take up to 10s.
chk-ib-pcispeed ended on 1 node(s) at 2022-02-17 15:08:31.397210, rc= 0, elapsed time: 0:00:02.202043
chk-ib-pcispeed PASS on node c650f99p26, serial number: 787C48A.

Unit test of chk-ib-pcispeed on RHEL 7.6-Alt, before the fix (issue does not occur): [root@c650mnp06 ~]# xdsh c650f02p13 "cat /etc/redhat-release"

c650f02p13: Red Hat Enterprise Linux Server release 7.6 (Maipo)

[root@c650mnp06 ~]# /opt/ibm/csm/hcdiag/bin/hcdiag_run.py --target "c650f02p13" --test "chk-ib-pcispeed" 2>&1 | grep chk-ib-pcispeed
Preparing to run chk-ib-pcispeed.
Executable: /opt/ibm/csm/hcdiag/tests/chk-ib-pcispeed/chk-ib-pcispeed.sh exists on remote node(s).
chk-ib-pcispeed started on 1 node(s) at 2022-02-17 09:49:00.683997. It might take up to 10s.
chk-ib-pcispeed ended on 1 node(s) at 2022-02-17 09:49:02.284784, rc= 0, elapsed time: 0:00:01.600787
chk-ib-pcispeed PASS on node c650f02p13, serial number: 787C54A.

Unit test of chk-ib-pcispeed on RHEL 7.6-Alt, after the fix (verifies no regression introduced by the fix):

[root@c650mnp06 ~]# xdsh c650f02p13 "cat /etc/redhat-release"
c650f02p13: Red Hat Enterprise Linux Server release 7.6 (Maipo)

[root@c650mnp06 ~]# /opt/ibm/csm/hcdiag/bin/hcdiag_run.py --target "c650f02p13" --test "chk-ib-pcispeed" 2>&1 | grep chk-ib-pcispeed
Preparing to run chk-ib-pcispeed.
Executable: /opt/ibm/csm/hcdiag/tests/chk-ib-pcispeed/chk-ib-pcispeed.sh exists on remote node(s).
chk-ib-pcispeed started on 1 node(s) at 2022-02-17 09:51:05.112206. It might take up to 10s.
chk-ib-pcispeed ended on 1 node(s) at 2022-02-17 09:51:06.714854, rc= 0, elapsed time: 0:00:01.602648
chk-ib-pcispeed PASS on node c650f02p13, serial number: 787C54A.

In addition to testing the hcdiag chk-ib-pcispeed test, I also added a new FVT testcase to cover this feature during regular regression testing.

Unit test of FVT hcdiag test case on RHEL 8.4, before the fix (shows the test failing):

[root@c650f99p06 buckets]# basic/hcdiag.sh

[root@c650f99p06 buckets]# grep chk-ib-pcispeed /test/results/buckets/basic/hcdiag.log
[2022-02-17 12:11:43.4863] Test Case 28: chk-ib-pcispeed:                                                                                   FAILED

Unit test of FVT hcdiag test case on RHEL 8.4, after the fix (verifies the test is successful):

[root@c650f99p06 buckets]# ./basic/hcdiag.sh 

[root@c650f99p06 buckets]# grep chk-ib-pcispeed /test/results/buckets/basic/hcdiag.log
[2022-02-17 15:13:59.5427] Test Case 28: chk-ib-pcispeed:                                                                                     PASS