avocado-framework-tests / avocado-misc-tests

Community maintained Avocado tests repository
Other
22 stars 123 forks source link

After run rdma_tests.py in Rhel8.2 with Avocado(version: 82.0), it shows " FAIL: Client cmd: ib_atomic_bw -F " #1892

Closed Gene-Lo closed 3 years ago

Gene-Lo commented 4 years ago

After run rdma_tests.py in Rhel8.2 with Avocado(version: 82.0), it shows " FAIL: Client cmd: ib_atomic_bw -F ".

【Test Step】 Step 1. Prepare 2 terminals while each one is equipped with 1 Network-Card. Then connect the 2 Network-Cards with each other. ※Network-Card: Mellanox_2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER

Step 2. Edit yaml file: /root/tests/tests/avocado-misc-tests/io/net/infiniband/rdma_tests.py.data/ib_read_bw_basic_infiniband.yaml image

Step 3. Run rdma_tests.py via cmd: avocado run rdma_tests.py -m rdma_tests.py.data/ib_atomic_bw_basic_infiniband.yaml image

【Test log】: job log & Manual-Test-log Test-log.zip

【Section of job.log】 《INIT 01-rdma_tests.py:RDMA.test》 2020-09-19 00:45:15,582 process L0604 INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' 2020-09-19 00:45:15,634 process L0416 DEBUG| [stdout] IB device mlx5_core not found 2020-09-19 00:45:15,635 process L0686 INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' finished with 0 after 0.050020694732666016s 2020-09-19 00:45:15,638 stacktrace L0039 ERROR| 2020-09-19 00:45:15,638 stacktrace L0042 ERROR| Reproduced traceback from: /usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py:767 2020-09-19 00:45:15,638 stacktrace L0045 ERROR| Traceback (most recent call last): 2020-09-19 00:45:15,638 stacktrace L0045 ERROR| File "/root/tests/tests/avocado-misc-tests/io/net/infiniband/rdma_tests.py", line 161, in test 2020-09-19 00:45:15,638 stacktrace L0045 ERROR| self.fail("Client cmd: %s %s" % (self.tool_name, self.test_op)) 2020-09-19 00:45:15,638 stacktrace L0045 ERROR| File "/usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py", line 953, in fail 2020-09-19 00:45:15,638 stacktrace L0045 ERROR| raise exceptions.TestFail(message) 2020-09-19 00:45:15,639 stacktrace L0045 ERROR| avocado.core.exceptions.TestFail: Client cmd: ib_atomic_bw -F 2020-09-19 00:45:15,639 stacktrace L0046 ERROR| 2020-09-19 00:45:15,639 test L0772 DEBUG| Local variables: 2020-09-19 00:45:15,674 test L0775 DEBUG| -> self <class 'rdma_tests.RDMA'>: 01-rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--F-053f

《INIT 02-rdma_tests.py:RDMA.test》 2020-09-19 00:48:13,354 process L0604 INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' 2020-09-19 00:48:13,407 process L0416 DEBUG| [stdout] IB device mlx5_core not found 2020-09-19 00:48:13,408 process L0686 INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' finished with 0 after 0.05049562454223633s 2020-09-19 00:48:13,410 stacktrace L0039 ERROR| 2020-09-19 00:48:13,410 stacktrace L0042 ERROR| Reproduced traceback from: /usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py:767 2020-09-19 00:48:13,410 stacktrace L0045 ERROR| Traceback (most recent call last): 2020-09-19 00:48:13,410 stacktrace L0045 ERROR| File "/root/tests/tests/avocado-misc-tests/io/net/infiniband/rdma_tests.py", line 161, in test 2020-09-19 00:48:13,411 stacktrace L0045 ERROR| self.fail("Client cmd: %s %s" % (self.tool_name, self.test_op)) 2020-09-19 00:48:13,411 stacktrace L0045 ERROR| File "/usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py", line 953, in fail 2020-09-19 00:48:13,411 stacktrace L0045 ERROR| raise exceptions.TestFail(message) 2020-09-19 00:48:13,411 stacktrace L0045 ERROR| avocado.core.exceptions.TestFail: Client cmd: ib_atomic_bw -F 2020-09-19 00:48:13,411 stacktrace L0046 ERROR| 2020-09-19 00:48:13,411 test L0772 DEBUG| Local variables: 2020-09-19 00:48:13,447 test L0775 DEBUG| -> self <class 'rdma_tests.RDMA'>: 02-rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--F-b3ec

《INIT 03-rdma_tests.py:RDMA.test》 2020-09-19 00:50:54,831 process L0604 INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' 2020-09-19 00:50:54,887 process L0416 DEBUG| [stdout] IB device mlx5_core not found 2020-09-19 00:50:54,888 process L0686 INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -q 192.168.10.2 'timeout 600 cat /tmp/ib_log && rm -rf /tmp/ib_log'' finished with 0 after 0.05382061004638672s 2020-09-19 00:50:54,890 stacktrace L0039 ERROR| 2020-09-19 00:50:54,890 stacktrace L0042 ERROR| Reproduced traceback from: /usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py:767 2020-09-19 00:50:54,890 stacktrace L0045 ERROR| Traceback (most recent call last): 2020-09-19 00:50:54,890 stacktrace L0045 ERROR| File "/root/tests/tests/avocado-misc-tests/io/net/infiniband/rdma_tests.py", line 161, in test 2020-09-19 00:50:54,890 stacktrace L0045 ERROR| self.fail("Client cmd: %s %s" % (self.tool_name, self.test_op)) 2020-09-19 00:50:54,891 stacktrace L0045 ERROR| File "/usr/local/lib/python3.6/site-packages/avocado_framework-82.0-py3.6.egg/avocado/core/test.py", line 953, in fail 2020-09-19 00:50:54,891 stacktrace L0045 ERROR| raise exceptions.TestFail(message) 2020-09-19 00:50:54,891 stacktrace L0045 ERROR| avocado.core.exceptions.TestFail: Client cmd: ib_atomic_bw -m 1024 2020-09-19 00:50:54,891 stacktrace L0046 ERROR| 2020-09-19 00:50:54,891 test L0772 DEBUG| Local variables: 2020-09-19 00:50:54,926 test L0775 DEBUG| -> self <class 'rdma_tests.RDMA'>: 03-rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--m_1024-5488

【Configuration】 《SUT6》 [Rhel8.2 Kernel] 4.18.0-193.14.3.el8_2.ppc64le

[FW config] BMC: op940.00.mih-5-0-g86f9791c2 PNOR: OP9-v2.4-4.37-prod

[HW config] CPU DD2.3 20core 2 Micron (MTA18ASF2G72PZ-2G9E1) 16G 16 Samsung PM985 960GB 1 PSU ACBEL 2000w 2 Slot1: Network2 - Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER Slot2: Network7 - Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T) Slot3: Network5 - Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER Slot4: Network6 - Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN) Slot5: Network10 - Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP Slot6: Network3 - Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER Slot7: Network9 - Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER)

《SUT8》 [Rhel8.2 Kernel] 4.18.0-193.19.1.el8_2.ppc64le

[FW config] BMC: op940.00.mih-5-0-g86f9791c2 PNOR: OP9-v2.4-4.37-prod

[HW config] CPU DD2.3 12core 2 Micron (MTA18ASF2G72PZ-2G9E1) 16G 16 Samsung PM985 960GB 1 PSU ACBEL 2000w 2 Slot1: Network2 - Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER Slot2: Network7 - Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T) Slot3: Network5 - Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER Slot4: Network6 - Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN) Slot5: Network10 - Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP Slot6: Network3 - Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER Slot7: Network9 - Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER)

manvanthar commented 3 years ago

User configuration error while editing the yaml file. The inputs are incorrect.

CA_NAME, PEER_CA are incorrect, the values for it has to be picked from ibstat command. for example the values format are like mlx5_0 etc.. But here the driver name is mentioned.

The values CA_NAME, PORT_NUM, PEERCA, PEERPORT all these needs to be set based on ibstat command on both the SUT and peer box.

Gene-Lo commented 3 years ago

After we edit the yaml file and run rdma_tests.py again, the result becomes PASS.

【Test Step】 ibstat image

vim rdma_tests.py.data/ib_atomic_bw_basic_infiniband.yaml image

avocado run rdma_tests.py -m rdma_tests.py.data/ib_atomic_bw_basic_infiniband.yaml

JOB ID : c67f72cf0f036bc446805ee8fcf73ce6a46ef8c2 JOB LOG : /root/tests/results/job-2020-10-20T18.32-c67f72c/job.log (01/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--F-fdcf: PASS (9.14 s) (02/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--F-159b: PASS (28.43 s) (03/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--m_1024-7225: PASS (5.13 s) (04/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--m_1024-67fb: PASS (43.63 s) (05/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--n_10000-d587: PASS (5.26 s) (06/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--n_10000-e7b1: PASS (44.24 s) (07/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--S_2-890a: PASS (5.15 s) (08/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--S_2-c769: PASS (28.01 s) (09/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--t_1024-5363: PASS (5.09 s) (10/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--t_1024-11bc: PASS (27.18 s) (11/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--p_18200-3e3e: PASS (5.18 s) (12/12) rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-2000-test_opt--p_18200-0181: PASS (27.70 s) RESULTS : PASS 12 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB HTML : /root/tests/results/job-2020-10-20T18.32-c67f72c/results.html JOB TIME : 235.72 s

【job.log】 job.log

Many thanks for you support !!