Azure / azurehpc

This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
MIT License
123 stars 65 forks source link

ib0 ip address setup #167

Closed tbugfinder closed 4 years ago

tbugfinder commented 4 years ago

Hi, I'm wondering if there's a common module for setup of the ib0 interface esp. for SR-IOV vm types.

Thanks.

garvct commented 4 years ago

If you use the Azure Marketplace image CentOS-HPC 7.6 or 7.7 (preferred), it will come with Mellanox OFED drivers installed and a number of MPI libraries will be preinstalled (openmpi, hpcx, intel mpi and mvapich2). This works well with HPC VM's like HB, HC and HBv2 (all support InfiniBand via SRIOV). You can access the different mpi libraries with module commands (use "module av" to see what mpi libraries are available).

tbugfinder commented 4 years ago

I'm using CentOS-HPC 7.7 however I don't see an ib0 interface. What do I miss here?

chadnar2 commented 4 years ago

It would be interesting to see if the device shows up. Are you able to get output from running 'lspci |grep -i Mell' ? I assume already that 'ifconfig' does not show the ib0. This may help determine if it's a software issue.

xpillons commented 4 years ago

On which VM size are you running ?

From: tbugfinder notifications@github.com Sent: mercredi 19 février 2020 20:32 To: Azure/azurehpc azurehpc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Azure/azurehpc] ib0 ip address setup (#167)

I'm using CentOS-HPC 7.7 however I don't see an ib0 interface. What do I miss here?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazurehpc%2Fissues%2F167%3Femail_source%3Dnotifications%26email_token%3DABYCJIUJ6TP6R7CBYLZ5Z4DRDWCJFA5CNFSM4KWW7HD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMJFWVI%23issuecomment-588405589&data=02%7C01%7CXavier.Pillons%40microsoft.com%7Ca8b135ae9b13485bb67208d7b57252b6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637177374945583995&sdata=mzdk2AXzlalMB%2BDt55U43joa4V8wvbhsqSrrOHQz6M8%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABYCJIVKBUCGUKVTL3B6VBDRDWCJFANCNFSM4KWW7HDQ&data=02%7C01%7CXavier.Pillons%40microsoft.com%7Ca8b135ae9b13485bb67208d7b57252b6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637177374945593990&sdata=%2FqaGv0jk%2FLB7XYlzXYRHWgB7Ba0qxXBYRL6kVCGvGbY%3D&reserved=0.

tbugfinder commented 4 years ago

This is HB60rs:

]# lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0750:00:02.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
chadnar2 commented 4 years ago

So the InfiniBand card is present. Does 'lsmod |grep ib' show the InfiniBand modules are loaded?

tbugfinder commented 4 years ago

This is on Standard_NC24rs_v3:

0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0001:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0002:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0003:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0004:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
7cb1:00:02.0 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

lsmod:

# lsmod|grep -i ib|sort
devlink                60067  2 mlx4_ib,mlx4_core
ib_core               255469  2 mlx4_ib,ib_uverbs
ib_uverbs             102208  1 mlx4_ib
libata                243133  3 pata_acpi,ata_generic,ata_piix
libcrc32c              12644  2 xfs,nf_conntrack
mlx4_core             315896  1 mlx4_ib
mlx4_ib               179001  0
tbugfinder commented 4 years ago

I switched to HC44rs sku and ib0 has an IP address.

edwardsp commented 4 years ago

Closing this issue now although please reopen if you still have any problems