Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
582 stars 153 forks source link

802.3ad bonding mode issue #49

Open galuha opened 8 years ago

galuha commented 8 years ago

Greetings. CentOS 6.7. ConnectX-3 EN. OFED 3.2-2.0.0.0 I am running sockperf tcp/ip ping-pong test. Trying to use vma with vlan over 802.3ad bonded interfaces. It works when server and client are connected directly with two cables (both are ConnectX-3 Pro) It fails when server (vlan over bond)<-> router 802.3ad configured<-> internet <-> client. This works on kernel stack, hence net config is fine.

sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: No messages were received from the server. Is the server down?

I have written the test ping-pong app much similar to sockperf. From its log it is clear that tcp connection is being established, but no message can pass to server.

here are the logs with VMA_TRACELEVEL=4

vmalog_failed_802.3ad_vlan_ping-pong.txt vmalog_failed_802.3ad_vlan_sockperf.txt

Is this a bug or i'm doing something wrong?

Network is configured in /etc/sysconfig/network-scripts. Not manually

Turning off the 802.3ad (keep vlan over eth0) on the server makes it work

rosenbaumalex commented 8 years ago

we'll have to take a deeper look to check this and update you. thanks for reporting.

MohammadQurt commented 8 years ago

Hi, Regarding the issue you found, we still didn’t succeed to reproduce the issue. According to the description and logs you attached we did the following to reproduce the issue (Please update us if we missed anything):

· Server-Client Configurations:

  1. Used OFED 3.2-2.0.0.0 with ConnectX3 on both server and client.
  2. Configured the server with vlan over 802.3ad bonded interface.
  3. Set the server and client on a different networks.
  4. Configured a server to behave as a router to route packets between server and client.

· 802.3ad Bonding configurations (Tried Immediate and Permeant configurations with the following parameters combinations):

  1. mode=4 miimon=100
  2. mode=4 miimon=100 fail_over_mac=0
  3. mode=4 miimon=100 fail_over_mac=1

· Server-Client Commands:

  1. Server: Sockperf TCP, 1 socket, non-blocking.
  2. Client: Sockperf TCP, Ping-Pong, 1 socket (non-blocking or blocking).

Also, please can you update us with the following:

  1. Seems both logs belong to a server, so can you attach VMA_TRACELEVEL=4 logs for client side.
  2. Can you please run sysinfo (https://mellanox.my.salesforce.com/sfc/p/#500000007heg/a/50000000Xab4/VneS.zpLith9.GWZ.XthrStGzhuRBH9SZS_DmBENbfI) for every VMA machines and send us the output.
  3. You said: It fails when server (vlan over bond)<-> router 802.3ad configured<-> internet <-> client. This works on kernel stack, hence net config is fine. What are the commands you used for router 802.3 configurations.
  4. Is the issue reproducible every time? Thanks in advance
galuha commented 8 years ago

Hello, here is the sysinfo file (remove .zip from behind): sysinfo-snapshot-v3.1.7-ofed-20160504-1945.tgz.zip

I will be able to get client side log later.

"· Server-Client Configurations:

  1. Used OFED 3.2-2.0.0.0 with ConnectX3 on both server and client.
  2. Configured the server with vlan over 802.3ad bonded interface.
  3. Set the server and client on a different networks.
  4. Configured a server to behave as a router to route packets between server and client." - it's not correct. I have an actual router between server and the client, configured LACP 802.3ad

"What are the commands you used for router 802.3 configurations": network-scripts.zip route add default gw 172.25.24.1 device bond0.380

MohammadQurt commented 8 years ago

Hi galuha,

Thanks for the update.

Can you please retry with the latest vma version here on githup: https://github.com/Mellanox/libvma

Thanks

mellanoxer commented 8 years ago

I have the same issue. VMA7.0.14 does not work correctly for TCP connections if vlan over bonding is used. Without libvma TCP connection is succesful.

mellanoxer commented 8 years ago

I've checked VMA 8.0.3. TCP connect() does not work if using vlan over bond. Created socket is marked as offloaded. But program (for instance "telnet" or my own program) cant go out of the connect() and frezes. All routes is correct. Without VMA all is ok, connected.. e.t.c.

mellanoxer commented 8 years ago

Is there any hope to solve this issue? What kind of additional tests I should to perform for helping libvma developers?

galuha commented 8 years ago

I guess there is no hope unless devs can reproduce the problem. MohammadQurt, I am offering the team viewer session to the server with the issue. Please let me know if you are willing to see it.

mellanoxer commented 8 years ago

@rosenbaumalex rosenbaumalex, to get more details may be you will add some additional log-lines in sources according to logs I previosly posted to you. I will run again with your patch and send you back more detailed logs.

According to my logs you can see that interfaces is offloadable, socket is offloadable, SYN was sended, and as I previosly sad - there was SYN-ACK in reply but seeing with tcpdump, and not inside libvma. That is we need more detailed logs-profiling in place of draining queues and receiving packets.

rosenbaumalex commented 8 years ago

We're still having issues with reproducing this. We're looking at the update log we received from your setup. Still no root cause understanding.

PS: best is if you open a support@mellanox.com ticket to get full tracking of this incident

OphirMunk commented 8 years ago

@galuha We are not able to reproduce your case. We tried it by several engineers and a Field Application Engineer. Is your team viewer session still relevant? Please let know

galuha commented 8 years ago

Yes, it is. Skype me for more details. I just have sent my skype id to you via email

OphirMunk commented 8 years ago

@galuha FAE will return to you within 1-2 weeks