TCP, zerocopy, packet size > MTU: packet is returned from zcopy_recv as two chunks of 1460+20 bytes but sometimes this is a two packets of 1460 and 20 bytes. In the second case message content pointer is set to beginning of a second packet payload. So packet header is incorrect in this case. This is a possible root cause of #2826990.
zerocopy, TCP, size > MTU: ping-pong data integrity check failed even for the first message because reply sender does not handle fragmented packet data properly: only the first fragment is valid, data from all other fragments are just skipped.
zerocopy: if zerocopy is not performed, the program uses invalid offset in buffer to retrieve data (extra offset to 'fd' bytes).
vmarxfiltercb: callback code always copies input packet data even if message resides entirely in a single data chunk.
Summary of changes
Message parsing code extracted to a separate class. If message is completely resides inside one memory chunk then it is handled in place. For fragmented messages there are two different strategies: receive next data chunk into the same buffer (for plain old recvfrom) or accumulate a message in a separate buffer (for zerocopy and socketxtreme).
Three existing strategies of input data handling are splitted into a different templated InputHandlers: recvfrom() handler with in-place message accumulation, socketxtreme/zerocopy handlers with buffered accumulation strategy. Runtime selection of data fetching method is replaced with template parameter. Each class encapsulates related data fetch and data iteration algorithms. As a result of this change compilation time and inline-growth parameter increased by 50%.
New handler for zerocopy data iterates properly over all packets and data chunks in iovec and properly handles the case when zero-copy is not performed.
Removed duplicated message handling code from Server, Client and myapp_vma_recv_pkt_filter_callback. This improves callback performance by avoiding memcpy if possible.
Improved recvfrom_zerocopy performance in case when zero-zopy is not performed: in some cases we can avoid memcpy on received data.
Additional changes
Fixed null-pointer dereference in parse_common_opt (found by cppcheck).
As a side-effect of changing zerocopy-related code, sockperf now support zerocopy-accelerated handling of UDP messages with size > MTU.
Performance comparison
original sockperf version: 3.7-11.gita57ffb579002
VMA 9.4.0
Message size: 1460 bytes
TCP ping-pong latency, us:
socket
VMA
zcopy
filtercb
xtreme
original
19.6
5.1
5.2
4.6
3.8
modified
18.6
5.0
4.9
4.5
3.8
UDP ping-pong latency, us:
socket
VMA
zcopy
filtercb
xtreme
original
16.1
3.8
3.8
3.7
3.4
modified
15.9
3.8
3.8
3.8
3.5
TCP throughput, Gbps:
socket
VMA
zcopy
filtercb
xtreme
original
7.8
18.5
16.5
🔴 8.2
14.5
modified
7.8
18.7
16.6
🟢15.3
14.4
UDP throughput, Gbps:
socket
VMA
zcopy
filtercb
xtreme
original
3.8
22.8
22.8
22.8
22.7
modified
3.8
22.8
22.8
22.8
22.8
I suggest to enable 'hide whitespace' option while reviewing a diff.
Summary of bugs
Summary of changes
Additional changes
parse_common_opt
(found by cppcheck).Performance comparison
original sockperf version: 3.7-11.gita57ffb579002 VMA 9.4.0 Message size: 1460 bytes
TCP ping-pong latency, us:
UDP ping-pong latency, us:
TCP throughput, Gbps:
UDP throughput, Gbps:
I suggest to enable 'hide whitespace' option while reviewing a diff.