COSMIC-SETI / FrontPage

Collection point for documents, specifications, notes...
0 stars 0 forks source link

Infrastructure: GPU NIC subnets are exclusionary #5

Open radonnachie opened 2 years ago

radonnachie commented 2 years ago
5: enp97s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether b8:ce:f6:a6:42:a1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.64.100/24 brd 192.168.64.255 scope global enp97s0f1
7: enp225s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether b8:ce:f6:a6:41:89 brd ff:ff:ff:ff:ff:ff
    inet 192.168.65.100/24 brd 192.168.65.255 scope global enp225s0f1

The /24 subnet and singular IP of the fengine packets means one of the NICs has a less direct route. Not an issue in anything other than python scripts it seems, but perhaps a neater architecture should be implmented anyhow.

jack-h commented 2 years ago

This is an annoying thing to make "right". Really, the F-engines and both NICs are on the same network (192.168.64.0/23). However, this has a tendancy to upset Linux, because it doesn't know which interface to use to transmit packets on this net. The cheat we use here is to put the two NICs on different /24 networks, and use the fact that IBVerbs will happily receive packets on an interface regardless of whether the source of the packets is on the right network. One solution involves multicasting everything. Another probably involves putting both NICs on the same /23 NIC but getting down and dirty with linux gateway config to make sure that things behave. Another probably involves bonding the interfaces in Linux. I think the only practical downside with the current configuration is that "normal" socket-based programs won't be able to see packets on the interface which is on a different subnet to the F-engines (this can be worked around by temporarily messing with the F-engine IP, or server IPs)