avocado-framework-tests / avocado-misc-tests

Community maintained Avocado tests repository
Other
22 stars 123 forks source link

When using rdma_tests.py of Avocado (version: 90.0) to test MLNX Network-Card in Rhel8.4, it shows “FAIL: Client cmd: ib_atomic_bw -F”. #2196

Closed Gene-Lo closed 1 year ago

Gene-Lo commented 2 years ago

When using rdma_tests.py of Avocado (version: 90.0) to test MLNX Network-Card in Rhel8.4, it shows “FAIL: Client cmd: ib_atomic_bw -F”.

※For MLNX Network-Card: (00WT175) 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER

《SOL-Log》 Before running rdma_tests.py, we confirm that it can connect to peer: SOL-Log

《Avocado-Log》 Since the file is too big, we only upload job.log: job.log

《FAIL-message in job.log》 INFO | Running 'ip link set enP48p1s0f0 mtu 1500' INFO | Command 'ip link set enP48p1s0f0 mtu 1500' finished with 0 after 0.0010233410030195955s INFO | Running 'ip -4 -j address show enP48p1s0f0' DEBUG| [stdout] [{"ifindex":28,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000,"addr_info":[{"family":"inet","local":"192.168.10.1","prefixlen":24,"scope":"global","label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Command 'ip -4 -j address show enP48p1s0f0' finished with 0 after 0.0004977909993613139s INFO | Running 'ip -4 -j address show enP48p1s0f0' DEBUG| [stdout] [{"ifindex":28,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000 INFO | Command 'ip -4 -j address show enP48p1s0f0' finished with 0 after 0.000788223998824833s DEBUG| [stdout] ,"addr_info":[{"family":"inet","local":"192.168.10.1","prefixlen":24,"scope":"global","label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Running 'ip -4 -j address show enP48p1s0f0' DEBUG| [stdout] [{"ifindex":28,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000,"addr_info":[{"family":"inet","local":"192.168.10.1","prefixlen":24,"scope":"global","label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Command 'ip -4 -j address show enP48p1s0f0' finished with 0 after 0.0005337000002327841s INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'sudo ip link set enP48p1s0f0 mtu 1500'' INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'sudo ip link set enP48p1s0f0 mtu 1500'' finished with 0 after 0.05666455200116616s INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' DEBUG| [stdout] [{"ifindex":32,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000,"addr_info":[{"family":"inet","local":"192.168.10.2","prefixlen":24,"broadcast":"192.168.10.255","scope":"global","noprefixroute":true,"label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' finished with 0 after 0.040614601002744166s INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' DEBUG| [stdout] [{"ifindex":32,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000,"addr_info":[{"family":"inet","local":"192.168.10.2","prefixlen":24,"broadcast":"192.168.10.255","scope":"global","noprefixroute":true,"label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' finished with 0 after 0.0398553510021884s INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' DEBUG| [stdout] [{"ifindex":32,"ifname":"enP48p1s0f0","flags":["BROADCAST","MULTICAST","UP","LOWER_UP"],"mtu":1500,"qdisc":"mq","operstate":"UP","group":"default","txqlen":1000,"addr_info":[{"family":"inet","local":"192.168.10.2","prefixlen":24,"broadcast":"192.168.10.255","scope":"global","noprefixroute":true,"label":"enP48p1s0f0","valid_life_time":4294967295,"preferred_life_time":4294967295}]}] INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -q 192.168.10.2 'ip -4 -j address show enP48p1s0f0'' finished with 0 after 0.03861141599918483s INFO | Running '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -O exit 192.168.10.2 ''' DEBUG| [stderr] Exit request sent. INFO | Command '/bin/ssh -o 'StrictHostKeyChecking=no' -o 'UpdateHostKeys=no' -o 'ControlPath=~/.ssh/avocado-master-%r@%h:%p' -l root -p 22 -O exit 192.168.10.2 ''' finished with 0 after 0.0029009680001763627s DEBUG| DATA (filename=output.expected) => NOT FOUND (data sources: variant, test, file) DEBUG| DATA (filename=stdout.expected) => NOT FOUND (data sources: variant, test, file) DEBUG| DATA (filename=stderr.expected) => NOT FOUND (data sources: variant, test, file) INFO | Running 'kill -19 802398' INFO | Command 'kill -19 802398' finished with 0 after 0.0016200770005525555s INFO | Running 'kill -9 802398' INFO | Command 'kill -9 802398' finished with 0 after 0.0016642740010865964s INFO | Running 'kill -18 802398' INFO | Command 'kill -18 802398' finished with 0 after 0.0013775459992757533s INFO | Running 'kill -19 802403' INFO | Command 'kill -19 802403' finished with 0 after 0.0005288539978209883s INFO | Running 'kill -9 802403' INFO | Command 'kill -9 802403' finished with 0 after 0.001140234999184031s INFO | Running 'kill -18 802403' INFO | Command 'kill -18 802403' finished with 0 after 0.0005043589990236796s ERROR| FAIL 01-rdma_tests.py:RDMA.test;run-ib_atomic_bw_basic-mtu-1500-test_opt--F-4fab -> TestFail: Client cmd: ib_atomic_bw -F

《Test Step》

  1. Equip 2 MLNX Network-Cards in 2 SUTs, and check that they can ping with each other.

  2. Search the codes of Network-Card's ports (ex. enP48p1s0f0 & enP48p1s0f1) via cmd: ibdev2netdev ibdev2netdev

  3. Edit Yaml-File like below: Yaml

  4. Run rdma_tests.py via cmd: avocado run rdma_tests.py -m rdma_tests.py.data/ib_atomic_bw_basic_infiniband.yaml

《Manual-Test-Log》 Manual-Test-Log.log

※For other MLNX Network-Cards However, when we use rdma_tests.py to test below MLNX Network-Cards, all results are PASS:

  1. (01FT740) 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER
  2. (01FT751) 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER

Since in the past we can use rdma_tests.py to test "(00WT175) 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER" in Rhel8.2, please check if it is the card's limit in Rhel8.4.

※Configuration 【SUT4】 [Kernel] 4.18.0-305.el8.ppc64le

[FW Config] BMC: op940.22.mih-1-0-g41157d8d2e Pnor: OP9_v2.4.1-4.31-prod

[HW Config] CPU DD2.3 16 core 2 Micron Technology(36ASF4G72PZ-2G6D1)32GiB x32 Unknown NVMe 1: SAMSUNG MZ1LB960HAJQ-00007 PSU ACBEL 2000w 2 Slot1: Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER Slot2: Emulex LPE16002B-M6-O 2-port 16Gb Fibre Channel card PCIe3 x8 LP Slot3: Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER Slot4: Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER) Slot5: Broadcom (LSI) MegaRAID 9361-8i SAS3 Controller w/ 8 internal ports Slot6: Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER Slot7: Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T) Slot8: Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN) Slot9: Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP Slot10: Broadcom 9305-16i SAS/SATA HBA PCIe Gen3 x8 LP 00VN497 - Seagate Skybolt - HDD 2400GB SAS 8 00VN628 - Micron 5100 PRO - SSD 1920GB SATA 8 00VN629 - Micron 5100 PRO - SSD 3840GB SATA *8

【SUT3】 [Kernel] 4.18.0-305.el8.ppc64le [FW Config] BMC: op940.22.mih-1-0-g41157d8d2e Pnor: OP9_v2.4.1-4.31-prod

[HW Config] CPU DD2.3 20 core 2 SK Hynix(HMA82GR7CJR4N-VK)16GiB x16 Unknown NVMe 1: XP400HE30002 PSU ACBEL 2000w 2 Slot1: Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER Slot2: Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER Slot3: Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER Slot4: Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER) Slot5: Broadcom 9305-16i SAS/SATA HBA PCIe Gen3 x8 LP Slot6: Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER Slot7: Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T) Slot8: Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN) Slot9: Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP Slot10: Broadcom (LSI) MegaRAID 9361-8i SAS3 Controller w/ 8 internal ports 00VN500 - Seagate Skybolt - HDD 600GB SAS *1

Naresh-ibm commented 1 year ago

closing as it is working fine now