charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
200 stars 50 forks source link

support multiple InfiniBand cards per node #1040

Open jcphill opened 8 years ago

jcphill commented 8 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/1040


The Crest Power8/GPU cluster at OLCF has two InfiniBand cards per node: https://www.olcf.ornl.gov/kb_articles/crest-user-information/

Charm++ currently attaches to the first interface of the first card it finds. For better performance Charm++ processes (nodes) should distribute themselves across the active interfaces. Ideally each processes would pick the card closest to the cores to which its pes are bound, but in the meantime the user could specify which network interface to use with a runtime option like "+netmap 0,1". (Please don't use +devices as NAMD uses this for GPUs and Xeon Phi coprocessors.)

jcphill commented 5 years ago

Original date: 2017-02-18 18:47:28


Target platform is now Summitdev: https://www.olcf.ornl.gov/kb_articles/summitdev-quickstart/#Hardware

If we end up using PAMI on Summit this will not be necessary.

jcphill commented 5 years ago

Original date: 2017-02-18 19:12:38


Confirming that both HCAs connect to a single network (rather than two parallel networks):

[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 122
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2"
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1"
[28] -> switch port {0xe41d2d030051e9a0}[6] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1"
[19] -> switch port {0x7cfe9003009600b0}[31] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1"
[1] -> ca port {0x248a07030047a586}[1] lid 122-122 "summitdev-r0c1n01 HCA-2"
To ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2"
[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 124
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2"
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1"
[22] -> switch port {0x7cfe900300a44b50}[9] lid 17-17 "MF0;summitdev-ibcore1:MSB7700/U1"
[12] -> switch port {0x7cfe9003009774e0}[20] lid 38-38 "MF0;summitdev-ibleaf-r0c1a:MSB7700/U1"
[1] -> ca port {0x248a07030047a5b2}[1] lid 124-124 "summitdev-r0c1n01 HCA-1"
To ca {0x248a07030047a5b2} portnum 1 lid 124-124 "summitdev-r0c1n01 HCA-1"
[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 42 122
From ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1"
[1] -> switch port {0x7cfe900300a44b70}[13] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1"
[28] -> switch port {0xe41d2d030051e9a0}[1] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1"
[19] -> switch port {0x7cfe9003009600b0}[31] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1"
[1] -> ca port {0x248a07030047a586}[1] lid 122-122 "summitdev-r0c1n01 HCA-2"
To ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2"
[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 42 120
From ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1"
[1] -> switch port {0x7cfe900300a44b70}[13] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1"
[29] -> switch port {0xe41d2d030051e9a0}[2] lid 19-19 "MF0;summitdev-ibcore3:MSB7700/U1"
[6] -> switch port {0x7cfe9003009600f0}[28] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1"
[15] -> ca port {0x248a07030047a57e}[1] lid 120-120 "summitdev-r0c0n15 HCA-2"
To ca {0x248a07030047a57e} portnum 1 lid 120-120 "summitdev-r0c0n15 HCA-2"
[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 40 42
From ca {0x248a07030047a6aa} portnum 1 lid 40-40 "summitdev-r0c0n13 HCA-2"
[1] -> switch port {0x7cfe9003009600f0}[13] lid 3-3 "MF0;summitdev-ibleaf-r0c0b:MSB7700/U1"
[33] -> switch port {0x7cfe900300bcee50}[6] lid 7-7 "MF0;summitdev-ibcore4:MSB7700/U1"
[2] -> switch port {0x7cfe900300a44b70}[34] lid 117-117 "MF0;summitdev-ibleaf-r0c0a:MSB7700/U1"
[13] -> ca port {0x248a07030047a9ca}[1] lid 42-42 "summitdev-r0c0n13 HCA-1"
To ca {0x248a07030047a9ca} portnum 1 lid 42-42 "summitdev-r0c0n13 HCA-1"
[jimp`summitdev-r0c0n13 ~/summitdev]$ ibtracert 122 124
From ca {0x248a07030047a586} portnum 1 lid 122-122 "summitdev-r0c1n01 HCA-2"
[1] -> switch port {0x7cfe9003009600b0}[1] lid 56-56 "MF0;summitdev-ibleaf-r0c1b:MSB7700/U1"
[23] -> switch port {0x7cfe9003009601d0}[16] lid 5-5 "MF0;summitdev-ibcore2:MSB7700/U1"
[15] -> switch port {0x7cfe9003009774e0}[27] lid 38-38 "MF0;summitdev-ibleaf-r0c1a:MSB7700/U1"
[1] -> ca port {0x248a07030047a5b2}[1] lid 124-124 "summitdev-r0c1n01 HCA-1"
To ca {0x248a07030047a5b2} portnum 1 lid 124-124 "summitdev-r0c1n01 HCA-1"
PhilMiller commented 5 years ago

Original date: 2017-10-11 20:16:22


Nitin, please work out how critical and feasible this is, and this whether it should be a target to complete for 6.9, with an intended preview release by SC.

jcphill commented 5 years ago

Original date: 2017-10-20 17:54:39


If we're using pami then it's not so critical, except if we wanted to compare pami to verbs.

ericjbohm commented 5 years ago

Original date: 2017-12-13 21:27:19


We expect to use pamilrts on summit dev so this is not urgent.

evan-charmworks commented 5 years ago

Original date: 2019-03-14 19:44:27


Jim Phillips wrote:

Charm++ currently attaches to the first interface of the first card it finds.

After this change was merged, it now queries all devices, eliminates inactive ones, and chooses the fastest active device to use. It still only uses one device. https://charm.cs.illinois.edu/gerrit/c/charm/+/4474 https://github.com/UIUC-PPL/charm/commit/254898aa13cf60bdf2f1709de355f505bcf2ff93

nitbhat commented 5 years ago

Original date: 2019-04-03 15:46:27


With a code browse of the PAMI{lrts} machine layers, I couldn't determine if PAMI uses multiple Infiniband cards internally. I'm checking with Bilge to see if she knows more about it.

nitbhat commented 5 years ago

Original date: 2019-04-11 19:23:33


I contacted Hui-fang Wen from IBM and heard back that PAMI internally uses multiple Infiniband cards i.e. each process uses the card closest to its core. For that reason, I'm retargeting this issue. In the future, we can evaluate if it is relevant for Verbs/UCX.