Mellanox / docker-sriov-plugin

Docker networking plugin for SRIOV and passthrough interfaces
Apache License 2.0
80 stars 19 forks source link

Unable to create bridge on same subnet of VF #10

Closed psaini79 closed 5 years ago

psaini79 commented 5 years ago

I tried to create to the bridge using sriov plugin but it kept on failing on ROCE CX5 card. I tried to use the same subnet which is available on host devices i.e. 192.168.10.0/24 but I get following error:

docker network create -d sriov --subnet=192.168.10.0/24 -o netdevice=re6 -o mode=passthrough mynet1 Error response from daemon: Pool overlaps with other one on this address space

I am able to create the bridge if I use different subnet but for my usecase bridge must be on same subnet so that I can reach other nodes running on same subnet. Also, I want to know if rdma stack can work inside the container? will rds-ping will work inside the container?

paravmellanox commented 5 years ago

@psaini79

  1. mode should be sriov. mode=sriov.
  2. rds-pring won't work, as its unsupported currently. other application such as rping should work depending on which kernel version you are running. kernel 4.19/4.20 is required for RoCE.
  3. This plugin doesn't create any bridge.
  4. You can have your PF netdevice in same subnet as that of container VFs.

Here is a sample example, ifconfig ens1f0 ens1f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 194.168.1.36 netmask 255.255.255.0 broadcast 194.168.1.255 inet6 fe80::268a:7ff:fe55:4660 prefixlen 64 scopeid 0x20 ether 24:8a:07:55:46:60 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 648 (648.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

$ docker network create -d sriov --subnet=194.168.1.0/24 -o netdevice=ens1f0 mynet

This will allow you to reach to other nodes in same subnet on other systems or VF to PF communication too.

psaini2018 commented 5 years ago

Thanks for the quick reply. I tried to create the network using SRIOV on same subnet but it didn't work.

ifconfig re1 re1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:1f txqueuelen 1000 (Ethernet) RX packets 24737 bytes 1706634 (1.6 MiB) RX errors 1692 dropped 0 overruns 0 frame 1692 TX packets 6911 bytes 465960 (455.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 docker network create -d sriov --subnet=192.168.10.0/24 -o netdevice=re1 mynet Error response from daemon: Pool overlaps with other one on this address space

Kernel version on host:

uname -a Linux scaqaj01adm02.us.oracle.com 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

Kernel version on Container

uname -a Linux racnode6 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

So rdma stack is supported from kernel version 4.19 inside the container? I am asking this question because when I create a macvlan bridge on ROCE interface and assign to container , I am able to ping all other IPs but /proc/sys/net/rds doesn't appear inside the container so rds-ping fails. I understand rds-ping is not supported but rping can work and rdma can be supported for ROCE from 4.19?

lspci output

` lspci | grep Mel

54:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

54:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

54:03.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:03.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:03.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:03.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:03.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:03.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:04.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.6 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:05.7 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:06.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

54:06.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

74:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

74:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

94:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

94:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

b4:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

b4:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] `

ifconfig output

` ifconfig bondeth0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet 172.16.1.15 netmask 255.255.255.0 broadcast 10.31.213.255 ether b0:26:28:2f:42:00 txqueuelen 1000 (Ethernet) RX packets 1941744 bytes 2171090611 (2.0 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 497208 bytes 59359447 (56.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:7f:c0:04:da txqueuelen 0 (Ethernet) RX packets 129235 bytes 6897362 (6.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 302056 bytes 505027213 (481.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 297267 bytes 30985391 (29.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 297267 bytes 30985391 (29.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo:1: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.2 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback)

lo:2: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.3 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback)

lo:3: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.4 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback)

re0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.9 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:1e txqueuelen 1000 (Ethernet) RX packets 30956 bytes 2173450 (2.0 MiB) RX errors 514 dropped 0 overruns 0 frame 514 TX packets 12153 bytes 874026 (853.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.10 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:1f txqueuelen 1000 (Ethernet) RX packets 24762 bytes 1708514 (1.6 MiB) RX errors 1692 dropped 0 overruns 0 frame 1692 TX packets 6936 bytes 467840 (456.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.11 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:26 txqueuelen 1000 (Ethernet) RX packets 28286 bytes 1919878 (1.8 MiB) RX errors 514 dropped 0 overruns 0 frame 514 TX packets 9786 bytes 637700 (622.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.12 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:27 txqueuelen 1000 (Ethernet) RX packets 23159 bytes 1589838 (1.5 MiB) RX errors 1690 dropped 0 overruns 0 frame 1690 TX packets 5559 bytes 362762 (354.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.13 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:2e txqueuelen 1000 (Ethernet) RX packets 39767 bytes 2586470 (2.4 MiB) RX errors 514 dropped 0 overruns 0 frame 514 TX packets 61958 bytes 5282890 (5.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.14 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:2f txqueuelen 1000 (Ethernet) RX packets 30595 bytes 2169872 (2.0 MiB) RX errors 1690 dropped 0 overruns 0 frame 1690 TX packets 11031 bytes 738886 (721.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.15 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:16 txqueuelen 1000 (Ethernet) RX packets 26861 bytes 1834644 (1.7 MiB) RX errors 514 dropped 0 overruns 0 frame 514 TX packets 9953 bytes 652546 (637.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

re7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2300 inet 192.168.10.16 netmask 255.255.255.0 broadcast 192.168.10.255 ether 50:6b:4b:df:17:17 txqueuelen 1000 (Ethernet) RX packets 29980 bytes 2132858 (2.0 MiB) RX errors 1690 dropped 0 overruns 0 frame 1690 TX packets 11153 bytes 749550 (731.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:e6:d3:af txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 `

paravmellanox commented 5 years ago

@psaini79 @psaini2018 I just checked patches, kernel 4.20 or higher is required. yes, rping and rdma should work. You should follow this post. Not too much different than what you are doing. Just for the reference. https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5

for duplicate subnet IP, please share the docker version and sriov plugin logs. I suspect you have these issue because you have multiple netdevices in same subnet. And this is not an issue with the sriov plugin. It seems to be a failure from the docker.

psaini79 commented 5 years ago

Thanks, you are right. I am able to create network on same subnet and able to ping the target from container.

` [root@b1936cddf1d3 mofed_installer]# ifconfig eth0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.10.50 netmask 255.255.255.0 broadcast 192.168.10.255 ether 8e:d4:4b:dc:dd:dd txqueuelen 1000 (Ethernet) RX packets 5 bytes 376 (376.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5 bytes 376 (376.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ` ping command to target works

` [root@b1936cddf1d3 mofed_installer]# ping 192.168.10.17

PING 192.168.10.17 (192.168.10.17) 56(84) bytes of data.

64 bytes from 192.168.10.17: icmp_seq=1 ttl=64 time=0.120 ms

64 bytes from 192.168.10.17: icmp_seq=2 ttl=64 time=0.078 ms `

However, rping command is failing. I executed following command on server i.e. on 192.168.10.17:

rping -s -C -a 192.168.10.17 -v Executed following command inside the container: ` [root@b1936cddf1d3 mofed_installer]# rping -c -a 192.168.10.17 -v

cma event RDMA_CM_EVENT_ADDR_ERROR, error -19

waiting for addr/route resolution state 1 ` I created the container using following command:

docker_rdma_sriov run --net=mynet --ip=192.168.10.50 -it mellanox/mlnx_ofed_linux-4.4-1.0.0.0-centos7.4 bash

paravmellanox commented 5 years ago

@psaini79 Can you please share the output when running rping -d ...

We also likely need to see kernel ftraces if it doesn't work.

We haven't tried mofed user space and upstream kernel. Usually with upstream kernel, upstream rdma-core (any version) should be used.

With mofed kernel, mofed user space should be used.

So post this, you might want to create rdma-core based container image.

Please also share the docker run command that you run. Did you follow the post I shared previously, listed below? https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5

psaini2018 commented 5 years ago

Yes, I followed the steps given in https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5. However, I executed steps from 5th step onward. Also, following commands exit the container without any error: docker run --net=host -v /usr/bin:/tmp rdma/container_tools_installer Please find the output of rping -d below:

` [root@b1936cddf1d3 mofed_installer]# rping -d -c -a 192.168.10.17 -v

client

verbose

created cm_id 0x97f1c0

cma_event type RDMA_CM_EVENT_ADDR_ERROR cma_id 0x97f1c0 (parent)

cma event RDMA_CM_EVENT_ADDR_ERROR, error -19

waiting for addr/route resolution state 1

destroy cm_id 0x97f1c0

`

I have one more question, any ETA for rds_ping to work inside the container?

paravmellanox commented 5 years ago

@psaini2018 please share output of $ ibdev2netdev in container $ show_gids in container $ uname -a on host.

What is the command you ran to run this container?

Please talk to Mellanox support for rds-ping.

psaini2018 commented 5 years ago

The above command failed inside the container:

[root@9098b076cc9a mofed_installer]# ibdev2netdev bash: ibdev2netdev: command not found [root@9098b076cc9a mofed_installer]# show_gids bash: show_gids: command not found

uname -a Linux rdma-setup 4.14.35-1902.0.12.el7uek.x86_64 #2 SMP Sat Mar 23 10:27:18 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

paravmellanox commented 5 years ago

@psaini2018 so as we discussed yesterday in this thread that you need kernel 4.20. Please upgrade to it.

psaini2018 commented 5 years ago

I need to upgrade the host kernel to 4.20? Just want to make sure to avoid any rework. Also, for rds_ping SR need to be opened through MLX support login? or is there any github repo for that to open the issue?

paravmellanox commented 5 years ago

@psaini2018, yes 4.20 or higher. 5.1 is even better. :-)

rds-ping is owned by Oracle, you should first resolve at Oracle on supporting rds-ping before opening Mellanox support case.

psaini79 commented 5 years ago

Ok and thanks a lot for your quick reply.

As per the following link Ethernet card inside the container made available using IPoIB. https://community.mellanox.com/s/article/docker-rdma-sriov-networking-with-connectx4-connectx5

I have a question, what is the difference between IPoIB device and VM IPoIB device? are they technically same?

paravmellanox commented 5 years ago

@psaini2018 yes. Can you please close this issue? If you like the plugin you can also star it. :-)