Open Fed3n opened 1 year ago
@Fed3n thank you for your analysis. Could you check connect() operation duration to another server w/o closing the first one. I guess that connect() duration might be long just for the first time.
@Fed3n have you had a chance to verify my assumption?
@igor-ivanov sorry this slipped my mind. I made a simple experiment with 3 servers, with one server running a client application that sends 4 alternating connects to the other two running an accepting application. No connection is closed. Logging is done to file so hopefully is not too influential in the measurement of internal functions.
Using VMA on all servers as described above:
===SERVER1 CONN1===
prepare_to_send: 9087475ns
attach_as_uc_receiver: 17735853ns
Total connect duration:26930693ns
===SERVER2 CONN1===
prepare_to_send: 11602ns
attach_as_uc_receiver: 1676412ns
Total connect duration:1744222ns
===SERVER1 CONN2===
prepare_to_send: 3937ns
attach_as_uc_receiver: 343114ns
Total connect duration:376918ns
===SERVER2 CONN2===
prepare_to_send: 3256ns
attach_as_uc_receiver: 209800ns
Total connect duration:240208ns
Using OS Stack on all servers:
===SERVER1 CONN1===
Total connect duration:51556ns
===SERVER2 CONN1===
Total connect duration:48811ns
===SERVER1 CONN2===
Total connect duration:49933ns
===SERVER2 CONN2===
Total connect duration:42788ns
You are absolutely right that only the very first connect takes a long time. Regardless, the attach_as_uc_receiver
call seems to be bottleneck even in later calls...
Thank you, @Fed3n on the first connection ring related resources are initialized.
VMA TCP connect() call takes much longer than OS
Configuration:
I'm testing VMA flow completion time for a TCP flow against the OS stack and I notice that a blocking connect() call on VMA takes about 1.5-2ms while the same measurement on the OS stack is about 40us. VMA is run on both hosts with
VMA_SPEC=latency
and compiled with--enable-tso
. To see where the bottleneck is, I did some measurements inside the VMA stack and see that in the connect() path thesockinfo_tcp::prepare_dst_to_send
andsockinfo::attach_as_uc_receiver
take 1.5-2ms combined, while the lwiptcp_connect
call after that only takes around 20us. send/recv delays once the connect is done are then much lower than on OS stack. Is this setup time for a new connection a known limitation of VMA or might there be something wrong with my setup?