Closed tbugfinder closed 4 years ago
If you use the Azure Marketplace image CentOS-HPC 7.6 or 7.7 (preferred), it will come with Mellanox OFED drivers installed and a number of MPI libraries will be preinstalled (openmpi, hpcx, intel mpi and mvapich2). This works well with HPC VM's like HB, HC and HBv2 (all support InfiniBand via SRIOV). You can access the different mpi libraries with module commands (use "module av" to see what mpi libraries are available).
I'm using CentOS-HPC 7.7 however I don't see an ib0 interface. What do I miss here?
It would be interesting to see if the device shows up. Are you able to get output from running 'lspci |grep -i Mell' ? I assume already that 'ifconfig' does not show the ib0. This may help determine if it's a software issue.
On which VM size are you running ?
From: tbugfinder notifications@github.com Sent: mercredi 19 février 2020 20:32 To: Azure/azurehpc azurehpc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Azure/azurehpc] ib0 ip address setup (#167)
I'm using CentOS-HPC 7.7 however I don't see an ib0 interface. What do I miss here?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazurehpc%2Fissues%2F167%3Femail_source%3Dnotifications%26email_token%3DABYCJIUJ6TP6R7CBYLZ5Z4DRDWCJFA5CNFSM4KWW7HD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMJFWVI%23issuecomment-588405589&data=02%7C01%7CXavier.Pillons%40microsoft.com%7Ca8b135ae9b13485bb67208d7b57252b6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637177374945583995&sdata=mzdk2AXzlalMB%2BDt55U43joa4V8wvbhsqSrrOHQz6M8%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABYCJIVKBUCGUKVTL3B6VBDRDWCJFANCNFSM4KWW7HDQ&data=02%7C01%7CXavier.Pillons%40microsoft.com%7Ca8b135ae9b13485bb67208d7b57252b6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637177374945593990&sdata=%2FqaGv0jk%2FLB7XYlzXYRHWgB7Ba0qxXBYRL6kVCGvGbY%3D&reserved=0.
This is HB60rs:
]# lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0750:00:02.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
So the InfiniBand card is present. Does 'lsmod |grep ib' show the InfiniBand modules are loaded?
This is on Standard_NC24rs_v3:
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0001:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0002:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0003:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0004:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
7cb1:00:02.0 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
lsmod:
# lsmod|grep -i ib|sort
devlink 60067 2 mlx4_ib,mlx4_core
ib_core 255469 2 mlx4_ib,ib_uverbs
ib_uverbs 102208 1 mlx4_ib
libata 243133 3 pata_acpi,ata_generic,ata_piix
libcrc32c 12644 2 xfs,nf_conntrack
mlx4_core 315896 1 mlx4_ib
mlx4_ib 179001 0
I switched to HC44rs sku and ib0 has an IP address.
Closing this issue now although please reopen if you still have any problems
Hi, I'm wondering if there's a common module for setup of the ib0 interface esp. for SR-IOV vm types.
Thanks.