Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
573 stars 153 forks source link

VMA TCP connect() call takes much longer than OS #1017

Open Fed3n opened 1 year ago

Fed3n commented 1 year ago

VMA TCP connect() call takes much longer than OS

Configuration:

I'm testing VMA flow completion time for a TCP flow against the OS stack and I notice that a blocking connect() call on VMA takes about 1.5-2ms while the same measurement on the OS stack is about 40us. VMA is run on both hosts with VMA_SPEC=latency and compiled with --enable-tso. To see where the bottleneck is, I did some measurements inside the VMA stack and see that in the connect() path the sockinfo_tcp::prepare_dst_to_send and sockinfo::attach_as_uc_receiver take 1.5-2ms combined, while the lwip tcp_connect call after that only takes around 20us. send/recv delays once the connect is done are then much lower than on OS stack. Is this setup time for a new connection a known limitation of VMA or might there be something wrong with my setup?

igor-ivanov commented 1 year ago

@Fed3n thank you for your analysis. Could you check connect() operation duration to another server w/o closing the first one. I guess that connect() duration might be long just for the first time.

igor-ivanov commented 1 year ago

@Fed3n have you had a chance to verify my assumption?

Fed3n commented 1 year ago

@igor-ivanov sorry this slipped my mind. I made a simple experiment with 3 servers, with one server running a client application that sends 4 alternating connects to the other two running an accepting application. No connection is closed. Logging is done to file so hopefully is not too influential in the measurement of internal functions.

Using VMA on all servers as described above:

===SERVER1 CONN1===
prepare_to_send: 9087475ns
attach_as_uc_receiver: 17735853ns
Total connect duration:26930693ns

===SERVER2 CONN1===
prepare_to_send: 11602ns
attach_as_uc_receiver: 1676412ns
Total connect duration:1744222ns

===SERVER1 CONN2===
prepare_to_send: 3937ns
attach_as_uc_receiver: 343114ns
Total connect duration:376918ns

===SERVER2 CONN2===
prepare_to_send: 3256ns
attach_as_uc_receiver: 209800ns
Total connect duration:240208ns

Using OS Stack on all servers:

===SERVER1 CONN1===
Total connect duration:51556ns

===SERVER2 CONN1===
Total connect duration:48811ns

===SERVER1 CONN2===
Total connect duration:49933ns

===SERVER2 CONN2===
Total connect duration:42788ns

You are absolutely right that only the very first connect takes a long time. Regardless, the attach_as_uc_receiver call seems to be bottleneck even in later calls...

igor-ivanov commented 1 year ago

Thank you, @Fed3n on the first connection ring related resources are initialized.